Synthesis and Analysis Sparse Representation Models for Image Restoration. Shuhang Gu 顾舒航. Dept. of Computing The Hong Kong Polytechnic University

Synthesis and Analysis Sparse Representation Models for Image Restoration Shuhang Gu 顾舒航 Dept. of Computing The Hong Kong Polytechnic University

Outline Sparse representation models for image modeling Synthesis based representation model Analysis based representation model Synthesis & analysis models for image modeling Weighted nuclear norm and its applications in low level vision Low rank models Weighted nuclear norm minimization (WNNM) WNNM for image denoising WNNM-RPCA and WNNM-MC and their applications Convolutional sparse coding for single image super-resolution Convolutional sparse coding (CSC) CSC for single image super resolution 2

Synthesis and analysis sparse representation models 3

Synthesis based sparse representation model Synthesis based sparse representation model assumes that a signal x can be represented as a linear combination of a small number of atoms chosen out of a dictionary D: x = Dα, s.t. α 0 <ε??...?? A dense solution A sparse solution Elad, M., Milanfar, P., Rubinstein, R. Analysis versus synthesis in signal priors. Inverse problems 2007. 4

Analysis based sparse representation model Analysis model generate representation coefficients by a simple multiplication operation, and assumes the coefficients are sparse: Px 0 <ε??...?? Elad, M., Milanfar, P., Rubinstein, R. Analysis versus synthesis in signal priors. Inverse problems 2007. 5

S&A representation models for image modeling A geometry perspective Synthesis model x = Dα, where α is sparse Analysis model β = Px, where β is sparse synthesis model emphasis the non-zero values in the sparse coefficient vector α, because these non-zero values select vectors in the dictionary to span the space of input signal A hyperplane Analysis model emphasis the zero values in the sparse coefficient vector Px, because these zero values select vectors in the projection matrix to span the complementary space of input signal Elad, M., Milanfar, P., Rubinstein, R. Analysis versus synthesis in signal priors. Inverse problems 2007. 6

S&A representation models for image modeling Image restoration/enhancement problems Image Denoising Image Super-resolution y = x + n Image Deconvolution y = D(k x) + n Image Inpainting y = k x + n y = M x + n 7

S&A representation models for image modeling Priors for image restoration Sparsity priors Non-local similarity priors Color line priors Buades A, Coll B, Morel JM. A non-local algorithm for image denoising. In CVPR 2005. 8

S&A representation models for image modeling Sparsity prior argmax x p x y = p y x p(x) Minimize the log(p x y ): x = argmin X 1 Transformation domain Gaussian likelihood dist. assumption Data prior modeling 2 x y F log(p(x)) Long-tail dist. leads to sparse solution Analysis model φ(px) p(x) Prior modeling Original signal domain Dist. Is not discriminative enough Decomposition domain Long-tail dist. leads to sparse solution Synthesis model ψ(α) 9

S&A representation models for image modeling Synthesis model min α 1 2 y Dα F + ψ α x = Dα Representative methods KSVD, BM3D, LSSC, NCSR, et. al. Pros - Synthesis model can be more sparse - Easier to embed non-local prior Cons - Patch prior modeling needs aggregation - Time consuming Analysis model min x 1 2 y x F + φ(px) Representative methods TV, wavelet methods, FRAME, FOE, CSF, TRD et. al. Pros - Patch divide free - Efficient in the inference phase - Easier to learn task specific prior Cons - Hard to embed non-local prior - Not as sparse as synthesis model 10

S&A representation models for image modeling Patch based Filter based Synthesis model Methods: KSVD, BM3D, LSSC, NCSR, etc. Pros - Synthesis model can be more sparse - Easier to embed non-local prior Cons - Patch prior modeling needs aggregation - Time consuming Analysis model methods: TV, wavelet methods, FOE, CSF, TRD etc. Pros - Patch divide free - Efficient in the inference phase - Easier to learn task specific prior Cons - Hard to embed non-local prior - Not as sparse as synthesis model 11

S&A representation models for image modeling Patch based Filter based Synthesis model Methods: KSVD, BM3D, LSSC, NCSR, etc. Pros - Synthesis model can be more sparse - Easier to embed non-local prior Cons - Patch prior modeling needs aggregation - Time consuming? Analysis model Methods: Analysis-KSVD et al. methods: TV, wavelet methods, FOE, CSF, TRD etc. Pros - Patch divide free - Efficient in the inference phase - Easier to learn task specific prior Cons - Hard to embed non-local prior - Not as sparse as synthesis model 12

S&A representation models for image modeling Synthesis model Analysis model Patch based Embed non-local prior Modeling texture/details better Embed non-local prior Modeling structure better Filter based Aggregation free Modeling texture/details better Aggregation free Modeling structure better Notes: Aggregation: Overlap aggregation method may smooth image or generate ringing artifacts Non-local prior: Non-local prior helps to generate visual plausible results on highly noisy situation 13

S&A representation models for image modeling Synthesis model Analysis model Patch based Embed non-local prior Modeling texture/details better Embed non-local prior Modeling structure better Filter based Aggregation free Modeling texture/details better Different applications may be better solved via different models! Aggregation free Modeling structure better Notes: Aggregation: Overlap aggregation method may smooth image or generate ringing artifacts Non-local prior: Non-local prior helps to generate visual plausible results on highly noisy situation 14

S&A representation models for image modeling Weighted nuclear norm minimization denoisning model A analysis model with patch based implementation Non-local prior Analysis model is good at structure modeling Denoising Convolutional sparse coding super resolution A synthesis model with filter based implementation Aggregation free Synthesis model is good at texture modeling SR 15

Weighted nuclear norm minimization and its applications in low level vision 16

Low rank models Matrix factorization methods min U,V loss Y X s. t. X = UV Loss functions are determined by different noise models: Gaussian noise model: PCA, Probabilistic PCA Sparse noise model: Robust PCAs Partial observations: Matrix completion Complex noise model: MoG etc. 17

Low rank models Regularization methods min X loss Y X + R(X) A common used regularization term is the nuclear norm of matrix X X = σ i (X) 1 Pros: - exact recovery property (theoretically proved) - nuclear norm proximal problem has closed-form solution Character: - Regularization method balance fidelity and low-rankness via parameter - Factorization method set upper bound. Candès, Emmanuel J., et al. "Robust principal component analysis?." Journal of the ACM, 2011. Candès, Emmanuel J., and Benjamin Recht. "Exact matrix completion via convex optimization." Foundations of Computational mathematics 2009. Cai, J. F., Candès, E. J., & Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2010. 18

Low rank models Regularization methods: a 2D analysis sparse perspective Analysis sparse model min x 1 2 y x F + φ(px) Nuclear norm regularization model min X 1 2 Y X F + U T XV 1 Nuclear norm regularization model can be interpreted as a 2D analysis sparse model! 19

Weighted nuclear norm minimization Nuclear norm proximal Pros min X 1 2 Y X F + λ X X = US λ σ Y V T Tightest convex envelope of rank minimization. Closed form solution. Cons Treat equally all the singular values. Ignore the different significances of matrix singular values. Cai, J. F., Candès, E. J., & Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2010. 20

Weighted nuclear norm minimization Weighted nuclear norm X w, = w i σ i (X) 1 Weighted nuclear norm proximal (WNNP) Difficulties X = argmin X X Y F + X w, The WNNM is not convex for general weight vectors The sub-gradient method cannot be used to analyze its optimization 21

Weighted nuclear norm minimization Theorem 1. Y R m n, let Y = UΣV T be its SVD. The optimal solution of the WNNP problem: is X = argmin X X Y F + X w, X = UDV T where D is a diagonal matrix with diagonal entries d=[d 1, d 2,, d r ] (r=min(m,n)) and d is determined by: r min d1, d 2 d n i=1 (d i σ i ) 2 + w i d i s. t. d 1 d 2 d r 0. 22

Weighted nuclear norm minimization Corollary 1. If the weights satisfy 0 w 1 w 2 w n, the non-convex WNNP problem has a closed form optimal solution: X = US w (Σ)V T where Y = UΣV T is the SVD of Y, and S w (Σ) ii = max Σ ii w i, 0. 23

WNNM for image denosing 1. For each noisy patch, search in the image for its nonlocal similar patches to form matrix Y. 2. Solve the WNNM problem to estimate the clean patches X from Y. 3. Put the clean patch back to the image. 4. Repeat the above procedures several times to obtain the denoised image. X = argmin X X Y F + X w, WNNM 24

WNNM for image denosing Weights setting Reweighting strategy to promote sparsity w i = C σ i X + ε Still only has one parameter Will not introduce much further computation burden 25

WNNM for image denosing Denoising experimental results 26

WNNM-RPCA min X,E E 1 + X w, s. t. Y = X + E Synthetic experiment: X, Y R m m, Rank X = P r m, E 0 = P e m 2 27

WNNM-RPCA 28

WNNM-MC min X X w, s. t. Y = X + E, P Ω E = 0 Synthetic experiment: X, Y R m m, Rank X = P r m, E 0 = P e m 2 29

WNNM-MC 30

WNNM summary We analyzed the weighted nuclear norm proximal (WNNP) problem. Based on WNNP, we proposed a new image denoising algorithm, and achieved state-of-the-art performance. We then extend weighted nuclear norm to WNNM-RPCA and WNNM-MC. WNNM achieved superior performance than NNM on both the two applications. 31

Convolutional sparse coding for single image super-resolution 32

Convolutional sparse coding Consistency constraint 33

Convolutional sparse coding Aggregation method in patch based algorithms Noisy Non Overlapping Center Pixel Overlapping EPLL min X X Y 2 + R i X Z i 2 + P(Z i ) Zoran D, Weiss Y. From learning models of natural image patches to whole image restoration. In: ICCV 2011. 34

Convolutional sparse coding Sparse coding min α y Dα F 2 +φ(α) Convolutional sparse coding min Z Y f i z i F 2 + φ(z i ) Matrix Form M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus. Deconvolutional networks. In CVPR, 2010. 35

Convolutional sparse coding for image SR LR Filter Learning N LR feature maps The Training Phase Joint HR Filter and Mapping Function Learning M HR feature maps CSC LR Filter Learning Mapp. Func. Learning HR Filter Learning N LR filters Mapp. Func. The Testing Phase M HR filters HR Feature Map Estimation Convolution 36

Convolutional sparse coding for image SR LR Filter Learning N LR feature maps The Training Phase Joint HR Filter and Mapping Function Learning M HR feature maps CSC LR Filter Learning Mapp. Func. Learning HR Filter Learning LR filter training B. Wohlberg. Efficient convolutional sparse coding. In ICASSP, 2014. 38

Convolutional sparse coding for image SR The Training Phase N LR filters Mapp. Func. The Testing Phase M HR filters HR Feature Map Estimation Convolution 40

Convolutional sparse coding for image SR Optimization: SA-ADMM The original problem can be write as: L. W. Zhong and J. T. Kwok. Fast stochastic alternating direction method of multipliers. In ICML, 2013. 41

Convolutional sparse coding for image SR Optimization: SA-ADMM SA-ADMM 42

Convolutional sparse coding for image SR 43

Convolutional sparse coding for image SR 44

Convolutional sparse coding for image SR 45

Convolutional sparse coding for image SR 46

CSC-SR: summary and future work Summary To avoid patch aggregation in super-resolution, we utilize convolutional sparse coding to deal with SR problem. SA-ADMM algorithm is used to train CSC-SR model for large scale training data. State-of-the-art SR results with high PSNR and visual quality. Future work End to end training strategy may be better. Is there any optimization algorithm which is more suitable for CSC training. 47

Related Publications and References Related Publication S. Gu, L. Zhang, W. Zuo, and X. Feng, Weighted Nuclear Norm Minimization with Application to Image Denoising, In CVPR 2014. Q. Xie, D. Meng, S. Gu, L. Zhang, W. Zuo, X. Feng and Z. Xu, On the Optimal Solution of Weighted Nuclear Norm Minimization Technical Report. S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, L. Zhang. Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision. Submitted to IJCV (Minor revision). S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, L. Zhang. "Convolutional Sparse Coding for Image Super-resolution," In ICCV 2015. References Elad, M., Milanfar, P., Rubinstein, R. Analysis versus synthesis in signal priors. Inverse problems 2007. Buades A, Coll B, Morel JM. A non-local algorithm for image denoising. In CVPR 2005. M. Aharon, M. Elad, A. Bruckstein. K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. TSP 2006. K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-d transformdomain collaborative filtering. TIP, 2007. 48

Related Publications and References References J. Marial, F. Bach, J. P once, G. Sapiro, and A. Zisserman. Non-local sparse models for image restoration. In ICCV 2009. W. Dong, L. Zhang, and G. Shi, Centralized Sparse Representation for Image Restoration, In ICCV 2011. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 1992. Zhu, S.C., Wu, Y., Mumford, D.: Filters, random fields and maximum entropy (frame): Towards a unified theory for texture modeling. IJCV 1998. Roth, S., Black, M.J.: Fields of experts. IJCV 2009. Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. In: CVPR. (2014) Chen, Y., Yu, W., Pock, T.: On learning optimized reaction diffusion processes for effective image restoration. In: CVPR. (2015) Rubinstein, Ron, Tomer Peleg, and Michael Elad. "Analysis K-SVD: A dictionary-learning algorithm for the analysis sparse model." TSP 2013. Tipping, Michael E., and Christopher M. Bishop. "Probabilistic principal component analysis." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61.3 (1999): 611-622. 49

Related Publications and References References Ke Q, Kanade T. Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming. In: CVPR 2005. Meng D, Torre FDL. Robust matrix factorization with unknown noise. In: ICCV 2013. Candès, Emmanuel J., et al. "Robust principal component analysis?." Journal of the ACM, 2011. Candès, Emmanuel J., and Benjamin Recht. "Exact matrix completion via convex optimization." Foundations of Computational mathematics 2009. Cai, J. F., Candès, E. J., & Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2010. Zoran D, Weiss Y. From learning models of natural image patches to whole image restoration. In: ICCV 2011. M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus. Deconvolutional networks. In CVPR, 2010. B. Wohlberg. Efficient convolutional sparse coding. In ICASSP, 2014. L. W. Zhong and J. T. Kwok. Fast stochastic alternating direction method of multipliers. In ICML, 2013. Note: References of comparison methods in the tables are omitted, all of these references can be found in my corresponding publications. 50

THANKS! 51