Total Variation Denoising with Overlapping Group Sparsity

1 Total Variation Denoising with Overlapping Group Sparsity Ivan W. Selesnick and Po-Yu Chen Polytechnic Institute of New York University Brooklyn, New York selesi@poly.edu

2 Abstract This paper describes an extension to total variation denoising wherein it is assumed the first-order difference function of the unknown signal is not only sparse, but also that large values of the first-order difference function rarely occur in isolation. This approach is designed to alleviate the staircase artifact often arising in total variation based solutions. A convex cost function is given and an iterative algorithm is derived using majorization-minimization. The algorithm is both fast converging and computationally efficient due to the use of fast solvers for banded systems.

Total variation denoising Data model: signal plus noise x : derivative of x is sparse w : white Gaussian noise y = x + w R N Definition of total variation (TV) denoising: x 1 = arg min x R N 2 y(n) x(n) 2 + λ n n x(n) x(n 1) (1) Total variation denoising (TVD) is suitable for piecewise-constant signals (i.e. signals with a sparse derivative function). For 1-D TV denoising, the exact solution can be obtained by a direct algorithm 1. 1 L. Condat. A direct algorithm for 1D total variation denoising. Tech. rep. http://hal.archives-ouvertes.fr/. Hal-67543, 212.

4 Applications of total variation 12 (TV) 1. denoising 234 2. deconvolution 567 3. reconstruction 8 4. nonlinear decomposition 91 5. compressed sensing 11. 2 A. Chambolle. An algorithm for total variation minimization and applications. In: J. of Math. Imaging and Vision 2 (24), pp. 89 97. 3 L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. In: Physica D 6 (1992), pp. 259 268. 4 R. Chartrand and V. Staneva. Total variation regularisation of images corrupted by non-gaussian noise using a quasi-newton method. In: Image Processing, IET 2.6 (Dec. 28), pp. 295 33. issn: 1751-9659. doi: 1.149/iet-ipr:2817. 5 J. Oliveira, J. Bioucas-Dias, and M. A. T. Figueiredo. Adaptive total variation image deblurring: A majorization-minimization approach. In: Signal Processing 89.9 (Sept. 29), pp. 1683 1693. doi: doi:1.116/j.sigpro.29.3.18. 6 S. Osher et al. An Iterative Regularization Method for Total Variation Based Image Restoration. In: Multiscale Model. & Simul. 4.2 (25), pp. 46 489. 7 J. Bect et al. A l 1 -Unified Variational Framework for Image Restoration. In: European Conference on Computer Vision, Lecture Notes in Computer Sciences. Ed. by T. Pajdla and J. Matas. Vol. 324. 24, pp. 1 13. 8 Y. Wang et al. A new alternating minimization algorithm for total variation image reconstruction. In: SIAM J. on Imaging Sciences 1.3 (28), pp. 248 272. 9 L. A. Vese and S. Osher. Image denoising and decomposition with total variation minimization and oscillatory functions. In: J. Math. Imag. Vis. 2 (24), pp. 7 18. 1 J.-L. Starck et al. Morphological component analysis. In: Proceedings of SPIE. Vol. 5914 (Wavelets XI). 25. 11 W. Yin et al. Bregman Iterative Algorithms for l1 -Minimization with Applications to Compressed Sensing. In: SIAM J. Imag. Sci. 1.1 (28), pp. 143 168. doi: 1.1137/773983. url: http://link.aip.org/link/?sii/1/143/1. 12 L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. In: Physica D 6 (1992), pp. 259 268.

5 Staircase artifacts 6 4 TEST SIGNAL 4 TEST SIGNAL 2 2 2 2 5 1 15 2 25 5 1 15 2 25 3 35 4 45 5 6 4 TEST SIGNAL PLUS NOISE 4 TEST SIGNAL PLUS NOISE 2 2 2 2 5 1 15 2 25 5 1 15 2 25 3 35 4 45 5 6 4 TV DENOISING ( λ = 3.) 4 TV DENOISING ( λ = 4.5) 2 2 2 2 5 1 15 2 25 5 1 15 2 25 3 35 4 45 5

Staircase artifacts TV denoising works well for piecewise constant signals. But for piecewise smooth signals, TV denoising produces stair-case artifacts. Several generalizations of TVD have been developed to address the staircase artifact 1314151617. 6 13 P. Rodriguez and B. Wohlberg. Efficient Minimization Method for a Generalized Total Variation Functional. In: IEEE Trans. Image Process. 18.2 (Feb. 29), pp. 322 332. issn: 157-7149. doi: 1.119/TIP.28.2842. 14 Y. Hu and M. Jacob. Higher Degree Total Variation (HDTV) Regularization for Image Recovery. In: IEEE Trans. Image Process. 21.5 (May 212), pp. 2559 2571. issn: 157-7149. doi: 1.119/TIP.212.2183143. 15 K. Bredies, K. Kunisch, and T. Pock. Total generalized variation. In: SIAM J. Imag. Sci. 3.3 (21), pp. 492 526. doi: 1.1137/9769521. eprint: http://epubs.siam.org/doi/pdf/1.1137/9769521. url: http://epubs.siam.org/doi/abs/1.1137/9769521. 16 F. I. Karahanoglu, I. Bayram, and D. Van De Ville. A Signal Processing Approach to Generalized 1-D Total Variation. In: IEEE Trans. Signal Process. 59.11 (Nov. 211), pp. 5265 5274. issn: 153-587X. doi: 1.119/TSP.211.2164399. 17 S.-H. Lee and M. G. Kang. Total Variation-Based Image Noise Reduction With Generalized Fidelity Function. In: IEEE Signal Processing Letters 14.11 (Nov. 27), pp. 832 835. issn: 17-998. doi: 1.119/LSP.27.91697.

Introduction We propose group sparse total variation: the signal derivative exhibits group sparsity. Signal model: derivative of x(t) is sparse and and large values of the derivative are rarely isolated (i.e., large values usually arise near, or adjacent to, other large values). 1. Problem formulated via convex sparse optimization 2. Group/clustering of the signal derivative promoted by suitable penalty function 3. Group locations unknown 4. Translation-invariant denoising Algorithm: 1. Majorization-minimization (MM) optimization method. 18 2. Fast algorithm for groups sparse-tvd uses fast solvers for banded systems. 3. No algorithm parameters 18 M. Figueiredo, J. Bioucas-Dias, and R. Nowak. Majorization-minimization algorithms for wavelet-based image restoration. In: IEEE Trans. Image Process. 16.12 (Dec. 27), pp. 298 2991.

Notation An N-point signal x(n) is denoted x = [x(),..., x(n 1)] T R N. The first-order difference of an N-point signal x is given by Dx where D is (N 1) N D = 1 1 1 1 A K-point group of the vector v will be denoted by... 1 1. (2) v n,k = [v(n),..., v(n + K 1)] R K. (3) v n,k is a block of K contiguous samples of v starting at index n.

9 Group-sparse total variation (GS-TV) denoising Signal plus noise x : derivative of x is group sparse w : white Gaussian noise y = x + w R N x = arg min {F (x) = 12 } y x 22 + λ φ(dx) x R N (4) v = Dx R (N 1), To promote group sparsity, define φ : R (N 1) R, φ(v) = n [ K 1 ] 1/2 v(n + k) 2. (5) k= K : group size. If K = 1, then φ(v) = v 1 and problem (4) is the standard 1D total variation. We refer to problem (4) as group-sparse total variation (GS-TV) denoising.

1 Majorization-minimization (MM) algorithm We use MM to derive a computationally efficient, fast converging, algorithm to minimize F (x). Using (3), penalty φ(v) is φ(v) = n v n,k 2. (6) To find a majorizor of F (x) defined in (4), we first find a majorizor of φ(v). First, note 1 2 u 2 v 2 2 + 1 2 u 2 v 2, u. (7) Using (7) for each group, a majorizor of φ(v) is given by g(v, u) = 1 [ ] 1 v n,k 2 2 2 u n n,k + u n,k 2 2 with provided u n,k 2 for all n. g(v, u) φ(v), g(u, u) = φ(u) (8)

11 3 φ(u) and quadratic majorizor g(u) 25 2 φ(u) 15 1 5 1 5 v 5 1 u

12 Note that g(v, u) is quadratic in v. It can be written as C : does not depend on v Λ(u) : diagonal matrix. After some manipulations, g(v, u) = 1 2 vt Λ(u) v + C (9) [ K 1 K 1 ] 1/2 [Λ(u)] n,n = u(n j + k) 2. (1) j= The entries of Λ are easily computed. k= Using (9), a majorizor of F (x) is given by G(x, u) = 1 2 y x 2 2 + λ g(dx, Du) (11) = 1 2 y x 2 2 + λ 2 xt D T Λ(Du) Dx + λc, (12) i.e., G(x, u) F (x), G(u, u) = F (u). (13)

3 To minimize F (x), the majorization-minimization (MM) defines an iterative algorithm via: x (i+1) = argmin G(x, x (i) ) x where i is the iteration index. The iteration is initialized with some x (). Here, the MM iteration gives x (i+1) = argmin y x 2 2 + λ xt D T Λ(Dx (i) ) Dx, (14) x which has the solution ( 1 x (i+1) = I + λd T Λ(Dx (i) ) D) y (15) where the diagonal matrix Λ(Dx (i) ) depends on Dx (i) per (1).

Problem: some diagonal entries of Λ(Dx (i) ) go to infinity as Dx (i) becomes sparse (divide by zero). Solution 19 : use the matrix inverse lemma (MIL) ( ) ( ) 1 I + λd T Λ(Dx (i) ) D = I D T 1 1 λ Λ 1 (Dx (i) ) + DD T D. (16) Using (16), update (15) becomes ( x (i+1) = y D T 1 ) 1Dy. λ Λ 1 (Dx (i) ) + DD T (17) }{{} banded Equation (17) constitutes an algorithm for GS-TV denoising (4). The large system matrix in (17) is banded (in fact, tridiagonal). Therefore, fast solvers for banded systems 2 can be used. The algorithm requires no user parameters (no step size parameters, etc.). 19 M. Figueiredo et al. On total-variation denoising: A new majorization-minimization algorithm and an experimental comparison with wavelet denoising. In: Proc. IEEE Int. Conf. Image Processing. 26. 2 W. H. Press et al. Numerical recipes in C: the art of scientific computing (2nd ed.) Cambridge University Press, 1992. isbn: -521-4318-5Sect 2.4.

15 Group-sparse total variation (GS-TV) denoising algorithm input: y, K, λ 1. x = y (initialization) 2. b = D T y repeat 3. u = Dx [ K 1 K 1 ] 1/2 4. [Λ] n,n = u(n j + k) 2 j= k= 5. F = 1 λ Λ 1 + DD T (F is tridiagonal) 6. x = y D T (F 1 b) (use fast solver) until convergence return: x

Example 1 TV and group-sparse TV denoising. Simple synthetic test signal + white Gaussian noise TV denoising: stair-case artifacts GS-TV denoising (K = 3): much less stair-case behavior, smaller root-mean-square-error (RMSE) TV promotes sparsity of Dx but does not promote any grouping or clustering tendencies (see figure). GS-TV promotes group sparsity of Dx : large values are adjacent to other large values (see figure). The group-sparse penalty function smooths the sparse derivative signal. The algorithm converges rapidly. The cost function monotonically decreases.

17 Example 1 15 Test signal 15 Test signal plus noise 1 1 5 5 5 1 15 2 5 1 15 2 15 c) TV denoising, λ = 7. 15 d) Group sparse TV denoising. Group size K = 3, λ = 3. 1 1 5 5 RMSE =.426 RMSE =.359 5 1 15 2 5 1 15 2 3 e) First order difference TV Denoising 2 1 1.5 f) First order difference Group sparse TV Denoising 5 1 15 2 5 1 15 2

18 Example 2 TV and group-sparse TV denoising. 1. The signal is row 256 of the lena image (512 512). 2. Group-sparse TV denoising gives less artificial blockiness (The signals around n = 3 is shown in detail.) 3. Group-sparse TV denoising has improved RMSE (6.41 compared with 6.85). 4. To examine the effect of group size K and regularization parameter λ, we computed the RMSE as a function of λ for group sizes from 1 through 1. The minimal RMSE is obtained for group size K = 6 and λ = 2.6. 7.4 7.2 7 6.8 6.6 6.4 RMSE : Group size 1 through 1 K = 1 K = 2 K = 1 6.2 1 2 3 4 5 6 7 8 9 1 λ

19 Example 2 3 Signal 3 Signal plus noise 2 2 1 1 1 2 3 4 5 σ = 1. 1 2 3 4 5 3 Total Variation Denoising, λ = 7.6 3 Group Sparse TV Denoising. Group size K = 6, λ = 2.6 2 2 1 1 RMSE = 6.853 1 2 3 4 5 RMSE = 6.41 1 2 3 4 5 25 2 TV Denoising (detail) 25 2 Group sparse TVD (detail) 15 15 1 1 5 5 26 27 28 29 3 31 32 33 34 35 26 27 28 29 3 31 32 33 34 35

Conclusion 1. GS-TV extends TV denoising to signals wherein the first-order difference function is not only sparse, but also exhibits a basic form of structured sparsity: large values of the first-order difference function rarely occur in isolation. 2. It is intended that this approach alleviates the staircase (blocking) artifact often arising in total variation based solutions. 3. A convex cost function is proposed and a fast converging computationally efficient algorithm is derived. The algorithm harnesses fast solvers for banded systems. 4. How should a suitable parameters K and λ be chosen based on minimal knowledge of the signal characteristics? 5. Group-sparse TV denoising for images... 6. Non-convex penalty functions for enhanced group-sparsity... MATLAB software available at http://eeweb.poly.edu/iselesni/gstv/