Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1

Size: px

Start display at page:

Download "Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1"

Jonathan Johnson
5 years ago
Views:

1 Section 5 Convex Optimisation 1 W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1

2 Convex Combination Denition 5.1 A convex combination is a linear combination of points where all coecients are non-negative and sum to 1. More specically, let x 1, x 2,, x l R n. A convex combination of these points is of the form l λ i x i, where the real coecients λ i satisfy λ i 0 and n i=1 λ i = 1. i=1 W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-2

3 Convex Sets Denition 5.2 A set X is a convex set if and only if the convex combination of any two points in the set belongs to the set. That is, X R n is convex x 1, x 2 X, λx 1 + (1 λ) x 2 X, λ [0, 1]. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-3

4 Examples Example of convex sets: A hyperplane H = { x : a T x = b }, where a R n, a 0, and b R. A halfspace H + = { x : a T x b }, where a R n, a 0, and b R. A polyhedron { } P = x : a T j x b j, j = 1,, m, c T j x = d j, j = 1,, p. Intersections of convex sets are convex. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-4

5 Convex Functions Denition 5.3 The domain of a function f : R n R is dened as the set of the points where the function f is nite, i.e., domf = {x R n : f (x) < }. Example: dom log x = R +. Denition 5.4 (Convex functions) A function f : R n R is convex if for any x 1, x 2 domf R n, λ [0, 1], it holds λf (x 1 ) + (1 λ) f (x 2 ) f (λx 1 + (1 λ) x 2 ). This denition implies that domf is convex. However, in this lecture notes, we usually assume dom f = R n for simplicity. A function f is strictly convex if strict inequality holds whenever x y and λ (0, 1). W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-5

6 A Convex Function W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-6

7 First-Order Condition of Convexity Theorem 5.5 Suppose a function f : R n R is dierentiable. Then it is convex if and only if for all x, y domf, it holds f (y) f (x) + f (x) T (y x). (12) W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-7

8 Necessity Assume rst that f is convex and x, y dom (f). Since dom (f) is convex, x + t (y x) dom (f) for all 0 < t 1. By convexity of f, Divide both sides by t. It holds f (x + t (y x)) (1 t) f (x) + tf (y). f (y) f (x) + Take the limit as t 0 yields (12). f (x + t (y x)) f (x). t W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-8

9 Suciency To show the other direction (suciency), assume that (12) holds. Choose any x y and λ [0, 1]. Let z = λx + (1 λ) y. Applying (12) twice yields f (x) f (z) + f (z) T (x z), f (y) f (z) + f (z) T (y z). Multiply the rst inequality by λ and the second by 1 λ, and then add them together. It holds λf (x) + (1 λ) f (y) f (z) + λ f (z) T (x z) + (1 λ) f (z) T (y z). Now note that x z = (1 λ) (x y) and y z = λ (x y). One obtains λf (x) + (1 λ) f (y) f (z), which proves that f is convex. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-9

10 Sublevel Sets Denition 5.6 (Sublevel Sets, a.k.a. Lower Contour Sets) The α-sublevel set of a function f : R n R is dened as C α = {x dom (f) : f (x) α}. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-10

11 Sublevel Sets of Convex Functions Lemma 5.7 Sublevel sets of a convex function f are convex. Proof: We shall show that for all x, y C α, it holds λx + (1 λ) y C α for all λ [0, 1]. By the denition of C α, f (x) α and f (y) α. By the convexity of f, f (λx + (1 λ) y) λf (x) + (1 λ) f (y) α, which proves this proposition. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-11

12 Sublevel Sets The converse of Lemma 5.7 is not true. That sublevel sets of a function f are convex does not imply that f is convex. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convexity 2018 page 5-12

13 Norm We've seen l p -norm in Denition 4.7. Denition 5.8 Given a vector space V over the eld F of complex (real) numbers, a norm on V is a function p : V R with the following properties: For all a F and all u, v V, 1. p (av) = a p (v), (absolute scalability) 2. p (u + v) p (u) + p (v), (triangle inequality) 3. if p (v) = 0 then v is the zero vector. (separates points) Positivity follows: By the rst axiom, p (0) = 0 and p ( v) = p (v). Then by triangle inequality, 0 p (v) + p ( v) = 2p (v) 0 p (v). W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 l p-norm 2018 page 5-13

14 Convexity of a Norm Lemma 5.9 A norm is a convex function. Proof: For any given u, v R n and λ [0, 1], it holds that λu + (1 λ) v λu + (1 λ) v = λ u + (1 λ) v, where we have used the triangle inequality and the absolute scalability. This establishes the convexity of the norm. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 l p-norm 2018 page 5-14

15 l p -Norm In Denition 4.7, it mentioned that l p -norm is a proper norm i p 1. Can be veried by using sub-level argument. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 l p-norm 2018 page 5-15

16 Constrained Convex Optimization Problems A constrained optimization problem of the form min f (x) x subject to h i (x) 0, i = 1,, m, l i (x) = 0, i = 1,, r, is convex if the objective function f 0 is convex, and the feasible set is convex. h i 's are convex (consequence of Lemma 5.7). l i 's are ane, i.e., in the form of a T i x + b i = 0. l i (x) = 0 l i (x) 0 and l i (x) 0. Both l i and l i need to be convex l i is ane. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convex Optimization 2018 page 5-16

17 Local Optimality and Global Optimality Theorem 5.10 Suppose that a feasible point x is locally optimal for a convex optimization problem. Then it is also globally optimal. Proof: Suppose that x is not globally optimal, i.e., there exists a feasible y x such that f (y) < f (x). Consider a point z on the line segment between x and y, i.e., z = (1 λ) x + λy, λ (0, 1). Then it is clear that f (z) (1 λ) f (x) + λf (y) < f (x), h i (z) (1 λ) h i (x) + λh i (y) 0, i = 0, 1,, m, a T i z = (1 λ) at i x + λat i y = b i, i = 1,, r, where the inequalities follow from the convexity of the functions f and h i 's. Hence, the point z is feasible and f (z) < f (x) for all λ (0, 1). This contradicts with that x is not locally optimal and proves the global optimality of x. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convex Optimization 2018 page 5-17

18 A Global Optimality Criterion Theorem 5.11 Suppose that the objective f 0 in a convex optimization problem is dierentiable, i.e., f (y) f (x) + f (x) T (y x), x, y dom (f). Let X denote the feasible set X = { x : h i (x) 0, i = 1,, m, a T i x = b i, i = 1,, r }. Then an x X is optimal if and only if f (x) T (y x) 0, y X. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convex Optimization 2018 page 5-18

19 Consequence of Theorem 5.11 For an unconstrained convex optimization problem, the sucient and necessary condition for a globally optimal point x is given by f (x) = 0. In a constrained convex optimization problem, it may happen that f (x) 0. This implies that x is at the boundary of the feasible set. (This is actually linked to KKT conditions and will be discussed later.) W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convex Optimization 2018 page 5-19

20 Proof The proof of suciency is straightforward. Suppose the inequality holds. Then for all y X, f (y) f (x) + f (x) T (y x) f (x). Hence, the point x is globally optimal. Conversely, suppose x is optimal, but the inequality does not hold, i.e., for some y X we have f (x) T (y x) < 0. Consider the point z (t) = ty + (1 t) x, t [0, 1]. Clearly, z (t) is feasible. Now d dt f (z (t)) t=0 = f (z (0)) d dt z (t) t=0 = f (x) (y x) < 0, where the inequality comes from the assumption. It implies that for small positive t, we have f (z (t)) < f (x), which contradicts the optimality of x. The necessity is therefore proved. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convex Optimization 2018 page 5-20

21 Non-dierentiable Functions: Subgradient Denition 5.12 If f : U R is a convex function dened on a convex open set U R n, a vector v R n is called a subgradient at a point x U if f (y) f (x) v T (y x), y U. The set of all subgradients at x is called the subdierential at x and is denoted f (x). Remark: If f is convex and its subdierential at x contains exactly one subgradient, then f is dierentiable at x. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convex Optimization 2018 page 5-21

22 Example 1 if x > 0, f (x) = x f = [ 1, 1] if x = 0, 1 if x < 0. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 Convex Optimization 2018 page 5-22

23 Section 6 l 1 -Minimization W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization 2018 page 6-1

24 l 1 -Minimization Want to solve the sparse linear inverse problem: y = Ax + e. Constrained optimization problem: if we know e ɛ, min x x 1 subject to Ax y 2 ɛ. Unconstrained optimization problem: LASSO 1 min x 2 Ax y λ x 1. a one-to-one correspondence between ɛ and λ. λ 0 implies ɛ 0. λ implies ɛ. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization IST 2018 page 6-2

25 Why l 1 -Minimization A geometric intuition: Feasible solution for y = Ax: x X = x 0 + N ull (A). W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization IST 2018 page 6-3

26 Solve the Lasso Problem: Scalar Case min x 1 2 (x y)2 + λ x. }{{} f(x) The minimum of f (x) is achieved at x # s.t. dx f ( x #) = 0, i.e., 1 if x > 0, x # y + λ d x d x dx = 0, where x # dx = [ 1, 1] if x = 0, 1 if x < 0. Or equivalently, x # is given by the soft thresholding function y λ if y λ, x # = η (y; λ) = 0 if λ < y < λ, y + λ if y λ. d W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization IST 2018 page 6-4

27 Vector Case: the Gradient Descent Method Gradient descent method: To solve min x f (x), one iteratively updates ( x k = x k 1 t k f x k 1), where t k > 0 is a suitable stepsize. For Lasso problem 1 2 y Ax λ x 1, the gradient is given by (see details on page 6-22) A T (y Ax) + x 1. Gradient is not unique! Which one should one choose? Optimal gradient may depend on t k. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization IST 2018 page 6-5

28 Gradient Descent Method: Another View In gradient descent method: x k = x k 1 t k f This is equivalent to minimize f, ( x k 1). x k = arg min x f (x) where ( f (x) := f x k 1) ( + x x k 1, f x k 1) + 1 x x k 1 2t k = 1 ( x x k 1 t k f (x k 1)) 2 2t + c. k W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization IST 2018 page 6-6

29 Iterative Shrinkage Thresholding (IST) To solve min x f (x) + λ x 1, we apply the proximal regularization: x k = arg min x f (x) + λ x 1 where f (x) + λ x 1 := f ( x k 1) + x x k 1, f ( x k 1) + 1 2t k x x k λ x 1 = 1 2t k x ( x k 1 t k f ( x k 1)) λ x 1 + c = i [ 1 2t k (x i z i ) 2 + λ x i Therefore, ] + c. x k = η ( x k 1 + t k A T ( y Ax k 1) ; λt k ). W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization IST 2018 page 6-7

30 Stable Recovery of Exact Sparse Signals Theorem 6.1 Let S be such that δ 4S 1 2. Then for any signal x 0 supported on T 0 with T 0 S and any perturbation e with e 2 ɛ, the solution x # obeys x # 2 x 0 C S ɛ, where the constant C S depends only on δ 4S. Typical value of C S C S { 8.82 for δ 4S = 1 5, for δ 4S = 1 4. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-8

31 Stable Recovery of Approximately Sparse Signals Theorem 6.2 Suppose that x 0 is an an arbitrary vector in R n and let x 0,S be the truncated vector corresponding to the S largest values of x 0 (in absolute value). When the matrix A satises RIP, the solution x # obeys x # x 0 2 C 1,S ɛ + C 2,S x 0 x 0,S 1 S. Typical values C 1,S and C 2,S 8.77 for δ 4S = 1 5. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-9

32 Interpretation Compressible signals: the entries obey a power law x 0 (k) c k r, where x 0 (k) is the k th largest value of x 0, r > 1. Consider the noiseless case. Suppose that a gene tells us the true signal x 0. The best S-term approximation x 0,S gives a distortion x 0 x 0,S 2 c S r+1/2 = c x 0 x 0,S S 1. (Computations details are given in Appendix on page 6-23.) Compare this result with Theorem 6.2. There is no algorithm performing fundamentally better than l 1 -min. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-10

33 Proof for Exact Sparse Signals Tube constraint: Ah 2 = Ax # 2 Ax 0 Ax # y + Ax 0 y 2 2ɛ. 2 Cone constraint: Let x # = x 0 + h. Then h T c 1 0 h T0 1. Proof: x 0 1 x # 1 = x 0 + h 1 = (x 0 + h) T0 1 + h T c 0 1 x 0 1 h T0 1 + h T c 0 1. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-11

34 Geometric Interpretation W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-12

35 Geometric Interpretation W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-12

36 Geometric Interpretation W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-12

37 Geometric Interpretation W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-12

38 Proof Since Ah 2 2ɛ, want to show h 2 Ah 2. (This is not true in general. For example Ah = 0 but h 2 can be ) Divide T0 c into subsets of size M (M = 3 T 0 ). List the entries in T0 c as n 1,, n N T0 in decreasing order of their magnitudes. Set T j = {n l, (j 1) M + 1 l jm}. Hence T 1 contains the indices of the M largest entries (in magnitude) of h T c 0, T 2 contains the indices of the next M largest entries (in magnitude) of h T c 0. Dene ρ = T 0 /M (ρ = 1/3 when M = 3 T 0 ). W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-13

39 Some Observations The k th -largest value of h T c 0 obeys h T c 0 (k) k l=1 ht c k 0 (l) h Tj+1 (k) htj 1 M. h T c 0 1 /k. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-14

40 Proof: Step 1 The l 2 -norm of h concentrates on T 01 = T 0 T1. h 2 2 = h T ht c (1 + ρ) h T Proof: From ht c 0 (k) ht c 0 1 /k, it holds h T c h T c (a) h T c N k=m+1 1 k 2 /M (b) h T M (c) h T T 0 ρ h T M, where (a) holds as N k=m+1 1/k2 1/M, (b) is from the l 1 -cone constraint, and (c) comes from the Cauchy-Schwartz inequality. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-15

41 Proof: Step 2 - A Technical Result htj 2 ρ h T0 2. j 2 Proof: By construction h Tj+1 (k) h Tj 1 /M. Then Hence, h Tj = k T j+1 h Tj+1 (k) 2 M h 2 Tj j 2 j 2 h Tj 1 1 / M (a) 2 h Tj 1 = M 2 h Tj 2 1 M. h 1 Tj / M = h T c 1 0 / M = j 1 (b) h T0 1 / M (c) T0 M h T 0 2 = ρ h T0 2, where (a) uses the variable change j = j 1, (b) and (c) follow from the cone constraint and the Cauchy-Schwartz inequality respectively. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-16

42 Proof: Step 3 Ah 2 = A T 01 h T01 + A Tj h Tj A T01 h T01 2 A Tj h Tj j 2 j 2 2 A T01 h T01 2 ATj h 2 Tj j 2 1 δ T0 +M h T δ M h 2 Tj Hence, j 2 ( 1 δ4s ρ ) 1 + δ 4S h T01 2. }{{} C 4S 2 h ρ h T ρ Ah 2 C 4S 1 + ρ C 4S 2ɛ. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Performance Guarantees 2018 page 6-17

43 Face Recognition with Block Occlusion W. Dai (IC) EE4.66 Data Proc. `1 -Minimization [Wright et al., 2009] Robust Face Recognition 2018 page 6-18

44 The Setup A set of training samples {φ i, l i } φ i R m is the vector representation of the images. l i {1, 2,, C} label for the C subjects. Test sample y Assumption: For simplicity, assume a good face alignment. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Robust Face Recognition 2018 page 6-19

45 Face Recognition via Sparse Linear Regression Suciently many images of the same subject i form a low-dimensional linear subspace in R m. Or equivalently y {j l j =i} φ j c j =: Φ i c i. y [Φ 1, Φ 2,, Φ C ] c = Φc R m where c = [, 0 T, c T i, 0 T, ]T. The l 1 -minimisation formulation for face recognition: min c 1 s.t. y Φc 2 ɛ. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Robust Face Recognition 2018 page 6-20

46 Robust Face Recognition When we have corruption and occlusion y Φx. Instead y Φc + e, where e is an unknown error vector whose entries can be very large. Assumption: only a fraction of pixels is corrupted ( 70% in some cases). Robust face recognition formulation: min c 1 + e 1 s.t. y = Φc + e. Or min w 1 s.t. y = [Φ, I] w. W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Robust Face Recognition 2018 page 6-21

47 Gradient Computation Denition 6.3 (Gradient) f (x) := [ d dx 1 f,, d dx n f] T. Example 6.4 Let f (x) = 1 2 y Ax 2 2. Then f = AT (y Ax). d dx at x = d dx xt a = a. d dx xt A T Ax = 2A T Ax. f (x) = 1 2 xt A T Ax y T Ax yt y, d dx f = AT Ax A T y = A T (y Ax). W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Appendix 2018 page 6-22

48 Sparse Approximation Error for Compressible Signals Let x 0 (k) c k r. Then x 0 x 0,S (k) x 0 x 0,S 1 { 0 k S, n k=s+1 S c k r k > S. ck r k=s+1 ck r cx r dx = c S r+1. x 0 x 0,S 2 2 k=s+1 ck 2r c S 2r+1. Hence x 0 x 0,S 2 c S r W. Dai (IC) EE4.66 Data Proc. l 1 -Minimization Appendix 2018 page 6-23

49 Section 7 Low Rank Matrix Recovery W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery 2018 page 7-1

50 Netix Problem J. Cameron C. Eastwood P. Jackson Roman Polanski Black Swan Titanic True Grit The King's Speech W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Introduction 2018 page 7-2

51 Blind Deconvolution [Ahmed, Recht, and Romberg, 2013] y = s h : y [n] = L l=0 s [n l] h [l]. After deblurring: W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Introduction 2018 page 7-3

52 Low Rank Matrices and Approximations Consider a matrix X 0 R m n with its SVD X 0 = min(m,n) k=1 σ k u k vk T, where K = min (m, n) and σ 1 σ 2 σ K 0. Theorem 7.1 (The Eckart-Young Theorem) The best low-rank approximation of X 0, i.e., min X X X 0 2 F s.t. rank (X) = R, is given by simply truncating the SVD ˆX = R σ k u k vk T. k=1 Remark: X 2 F = i,j X2 i,j = vec (X) 2 2. W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Introduction 2018 page 7-4

53 Low Rank Matrix Recovery Let A : R m n R L is a linear measurement operator that takes L inner products with predened matrices A 1,, A L : A : R m n R L X 0 y l = X 0, A l = trace ( A T l X ) m 0 = i=1 j=1 n X 0 [i, j] A l [i, j]. The low-rank matrix recovery problem is given by min X y A (X) 2 2 s.t. rank (X) R. Example 7.2 In the Netix problem, A l [i, j] = 1 and A l [s, t] = 0 for all [s, t] [i, j]. W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Introduction 2018 page 7-5

54 Another Look at the Linear Operator A A : R m n R L X y = Avect (X), where A R L (m n). W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Introduction 2018 page 7-6

55 Alternating Projection To solve min X y A (X) 2 2 s.t. rank (X) R is the same as to look for an L R m R and a R R n R s.t. y A ( LR T ) 2 2. min L,R Alternating projection: R k+1 = arg min R L k+1 = arg min L y A ( L k R T ) 2 2, y A ( LR T k+1) 2 2. W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Methods 2018 page 7-7

56 Alternating Projection (2) Details on xing L and updating R: = X 0 [I j, j] = L Ij,:Rj,: T. W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Methods 2018 page 7-8

57 Nuclear Norm Minimization Dene the nuclear norm X = min(m,n) k=1 σ i, which is the l 1 -norm of the singular value vector. Constrained optimization problem: Unconstrained optimization problem: min X X s.t. y A (X) 2 2 ɛ. min X 1 2 y A (X) λ X. W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Methods 2018 page 7-9

58 l 1 -norm and Nuclear Norm l 1 -norm Write x = n i=1 x ie i where e i is the i th natural basis vector. x 1 = n i=1 x i. n x 1 = sign (x i ) e i = {v : v i = sign (x i )}. i=1 Nuclear norm X = min(m,n) i=1 σ i u i v T i and X = min(m,n) i=1 σ i. X = min(m,n) i=1 sign (σ i ) u i v T i { } = U r Vr T + U m r T Vn r T : T R (m r) (n r), σ (T ) 1. W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Methods 2018 page 7-10

59 Soft Thresholding Function l 1 -norm minimization with given z R n Let ˆx = arg min x 1 2 x z λ x 1. Then ˆx = i η (z i ; λ) e i where η (z i ; λ) = sign (z i ) max (0, z i λ). Nuclear norm minimization with given Z R m n Let ˆX = arg min X 1 2 X Z 2 F + λ X. Then ˆX = min(m,n) i=1 η (σ i ; λ) u i v T i where η (σ i ; λ) = sign (σ i ) max (0, σ i λ). W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Methods 2018 page 7-11

60 ISTA min 1 2 y Ax λ x 1 1 x 2 y Ax 2 2 = AT (y Ax). f = 1 2 y Ax t k x ( x k 1 t k f ) 2 2. x k = η (x k 1 + t k A T ( y Ax k 1) ; λt k ). min 1 2 y A (X) λ X 1 X 2 y A (X) 2 2 = A (y A (x)). f = 1 2 y A (X) ( X 2t k X k 1 t k f ) 2 F. X k = η σ (X ( k 1 + t k A y A (X k 1)) ) ; λt k. W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Methods 2018 page 7-12

61 Iterative Hard Thresholding Algorithm min 1 2 y Ax 2 2 s.t. x 0 S x k = H S (x k 1 + µ k A T ( y Ax k 1)). min 1 2 y A (X) 2 2 s.t. rank (X) R X k = H R,σ (X k 1 + t k A ( y A ( X k 1))). W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Methods 2018 page 7-13

62 Comments on Performance Guarantees When A ( ) is a Gaussian random `projection', RIP condition will hold with high probability: 1 δ A (X) δ, X s.t. rank (X) R. For matrix completion: dicult when X is low-rank and sparse. Want coherence constant small: µ (U) := N R max 1 i N P U e i 2 2 = O (1). W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Performance Guarantees 2018 page 7-14

63 Blind Deconvolution: The Problem Given a convolution of two signals what are x [n] and h [n]? y [n] = L s [n l] h [l], l=0 This bilinear problem is dicult to solve. Scaling ambiguity. W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Blind Deconvolution 2018 page 7-15

64 Blind Deconvolution: The Idea Each entries of y = x h is a sum along a skew diagonal of the rank-1 matrix xh T : min X s.t. y = A (X). W. Dai (IC) EE4.66 Data Proc. Low Rank Matrix Recovery Blind Deconvolution 2018 page 7-16

Numerical Optimization

Numerical Optimization Convex Sets Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Let x 1, x 2 R n, x 1 x 2. Line and line segment Line passing through x 1 and x 2 : {y