arxiv: v2 [cs.cv] 15 Mar 2018

Size: px

Start display at page:

Download "arxiv: v2 [cs.cv] 15 Mar 2018"

Paulina Burns
5 years ago
Views:

1 arxiv: v2 [cs.cv] 15 Mar 2018 Blind Image Deconvolution using Deep Generative Priors Muhammad Asim, Fahad Shamshad, and Ali Ahmed The authors have contributed equally to this work. Abstract. This paper proposes a new framework to regularize the ill-posed and non-linear blind image deconvolution problem by using deep generative priors. We employ two separate deep generative models one trained to produce sharp images while the other trained to generate blur kernels from lower-dimensional parameters. The regularized problem is efficiently solved by simple alternating gradient descent algorithm operating in the latent lower-dimensional space of each generative model. We empirically show that by doing so, excellent image deblurring results are achieved even under extravagantly large blurs, and heavy noise. Our proposed method is in stark contrast to the conventional end-to-end approaches, where a deep neural network is trained on blurred input, and the corresponding sharp output images while completely ignoring the knowledge of the underlying forward map (convolution operator) in image blurring. Keywords: Blind image deblurring, generative adversarial networks, variational autoencoder, alternating gradient descent 1 Introduction Blind image deblurring aims to recover true image i and blur kernel k from blurry and possibly noisy observation y. For a uniform and spatially invariant blur, its mathematical formulation is y = i k + n, (1) where is convolution operator and n is additive Gaussian noise. In its full generality, the inverse problem (1) is severely ill-posed as many different instances of i, and k fit the observation y. To resolve between multiple instances, priors are introduced on images and/or blur kernels in the image deblurring algorithms. Priors assume some a a priori model on the true image/blur kernel or both. Common priors include sparsity of the true image and/or blur kernel in some transform domain such as wavelets, curvelets, etc, sparsity of image gradients [1,2], l 0 regularized prior [3], internal patch recurrence [4], low-rank [5,6], hyperlaplacian prior [7], etc. Although generic and extendable to multiple applications, these engineered models are not very effective as many unrealistic images also fit the prior model. In last couple of years, advances in generative modeling of images has taken it well beyond the conventional prior models above. The introduction of such deep generative priors in the image deblurring inverse problem should enable a far more effective regularization yielding sharper and better quality deblurred images. Our experiments in

2 Authors Suppressed Due to Excessive Length Original Deblurred Blurry Fig. 1: Blind image deblurring using deep generative priors.

help of two implicit pre-trained generative models; one trained on clean images, and the other on blur kernels.

Figure 1 confirm this hypothesis, and show that using deep generative priors in solving the inverse problem (1) produces promising

The blurred faces are almost not recognizable by the naked eye due to extravagant blurs yet the algorithm deblurs the faces near

Interestingly, owing to the implicit constraints imposed by generative priors, any deviation, for example, in the deblurred faces

orientation. This form of information loss is mainly harmless in important applications such as face recognition from blurred images.

kernel k belong to the range of their respective pre-trained generator, we can recover a visually appealing, and a sharp

Specifically, the algorithm searches for a pair (î, ˆk) in the range of respective pre-trained generators of images and blur kernels

Implicitly, this aggressively regularizes the alternating gradient descent algorithm to produce sharp and clean images as solutions.

dimension of the images, it not only reduces the number of unknowns in the deblurring problem but also allows for an efficient

2 2 Authors Suppressed Due to Excessive Length Original Deblurred Blurry Fig. 1: Blind image deblurring using deep generative priors. Deblurred images (second row) recovered from the blurry ones (first row) using an alternating gradient descent algorithm with the help of two implicit pre-trained generative models; one trained on clean images, and the other on blur kernels. Extremely blurred faces and numbers, not even recognizable to the naked eye, are recovered faithfully using this approach. Figure 1 confirm this hypothesis, and show that using deep generative priors in solving the inverse problem (1) produces promising deblurring results on the standard data sets of face images and house numbers. The blurred faces are almost not recognizable by the naked eye due to extravagant blurs yet the algorithm deblurs the faces near perfectly with the assistance of generative priors. Interestingly, owing to the implicit constraints imposed by generative priors, any deviation, for example, in the deblurred faces from the original faces does not manifest itself through loss of sharpness rather through slight change of facial expressions, or orientation. This form of information loss is mainly harmless in important applications such as face recognition from blurred images. This paper shows with an empirical study that given a noisy, and blurry image y, and the assumption that the true image i and blur kernel k belong to the range of their respective pre-trained generator, we can recover a visually appealing, and a sharp approximation of the true image using alternating gradient descent algorithm. Specifically, the algorithm searches for a pair (î, ˆk) in the range of respective pre-trained generators of images and blur kernels that explain the blurred image y. Implicitly, this aggressively regularizes the alternating gradient descent algorithm to produce sharp and clean images as solutions. Since the range of the generative models can be traversed by a much lower dimensional latent representations compared to the ambient dimension of the images, it not only reduces the number of unknowns in the deblurring problem but also allows for an efficient implementation of gradients in this lower dimensional space using back propagation through the generators. Our numerical experiments manifest that, in general, the deep generative priors yield better deblurring results compared to the classical image priors studied extensively in the literature. It is instructive to compare our approach against a standard end-to-end training of a deep neural net trained to fit blurred images with corresponding sharp ones [8,9,10,11].

3 Blind Image Deconvolution using Deep Generative Priors 3 The main drawback of the end-to-end framework is that it completely ignores the knowledge of forward map or the governing equations (1) that explain the image blurring phenomenon. Consequently, the deblurring is more sensitive to changes in the blur kernels, images, and noise distributions in the test set that are not representative of the training data. Comparatively, our approach explicitly takes into account the knowledge of forward map (1) in the gradient decent algorithm to achieve robust results. Moreover, our approach does not require expensive retraining of every deep network involved in case of partial changes in blur problem specifications such as alterations in the blur kernels or noise models; in the former case, we only have to retrain the blur generative model, and in the later case, we only need to modify the loss function of the descent algorithm. We show experimentally that our approach is better than classical engineered priors and competitive against end-to-end learning models. We also show that how end-toend models fail under different noise levels and large blurs, while our approach is very robust. 1.1 Main Contributions To the best of our knowledge, this is the first work that investigates the power of deep generative priors to regularize ill-posed and non-linear inverse problem of blind image deblurring. We show that this regularized problem is efficiently solved by simple alternating gradient descent algorithm. Our results show that compared to the conventional engineered linear models, the deep generative priors enforce the image model much more aggressively leading to visually pleasing and sharp deblurred images. The method produces excellent results even under tough conditions of extravagantly large blurs, and heavy noise. The rest of the paper is organized as follows. In Section 2, we give an overview of the related work. We formulate the problem in Section 3 followed by our proposed alternating gradient descent algorithm in Section 4. Section 5 contains experimental results followed by concluding remarks in Section 6. 2 Related Work Neural networks based implicit generative models such as Generative Adverserial Networks (GANs) [12] and Variational Autoencoders (VAEs) [13] have found much success in modelling complex data distributions specially that of images. Due to their power of modelling distributions of natural image manifold, they have been extensively used to solve ill-pose inverse problems like compressed sensing [14], image superresolution [15], image inpainting [16] etc. Recently GANs and VAEs have also been used for blind image deblurring. In [17] the authors jointly deblur and superresolve low resolution blurry text and face images by introducing novel feature matching loss term during training process of GAN. They show their proposed formulation favours clean high resolution images over low resolution blurry images. Similarly sparse coding based framework has been proposed in [18] consisting of both VAE that learns data prior by encoding its features into compact form,

4 4 Authors Suppressed Due to Excessive Length and GAN for discriminating clean and blurry image features. Their method can work on both space-invariant and space variant blurs and show competitive performance when compared with recent image deblurring algorithms. Recently in [19] authors employed variant of GAN, conditional GAN for blind motion deblurring in an end-to-end framework and optimize it using multi-component loss functions consisting of content and adversarial loss. They also evaluate the quality of deblurred images through their algorithm on object detection task and show promising results. However all these generative models based deblurring pproaches are end-to-end and suffering from same draw backs as other deep learning techniques i.e. retraining for different noise level and different blurs. Our approach is inspired by [14] that use pretrained generator on natural images to solve the ill posed compressed sensing problem. Similar approach has been used for image inpainting problem [16]. We also note the work of [20] and [21] that use pretrained generators for circumventing the issue of adversarial attacks on deep neural networks. 3 Problem Formulation We assume the image i R n and blur kernel k R n in (1) are members of some structured classes I of images, and K of blurs, respectively. As an example, I may be a set of celebrity faces, or vehicle number plate images, and K comprises of natural motion and Gaussain blurs. A representative sample set from both classes I and K is employed to train generative models for each class. We denote by the mappings G I : R l R n and G K : R m R n, the generators for class I, and K, respectively. Given low-dimensional inputs z i R l, and z k R m, the generators G I, and G K generate new samples G I (z i ), and G K (z k ) that are representative of the classes I, and K, respectively. These generators are fixed after training. To recover the clean image, and blur kernel (i, k) from the blurred image y in (1), we propose minimizing the following objective function (î, ˆk) := argmin y i k 2. (2) i Range(G I ) k Range(G K ) where Range(G I ) and Range(G K ) is the set of all the image and blurs that can be generated by G I and G K, respectively. In words, we want to find an image i and a blur kernel k that best explains the forward model (convolution operator) (1) from the range of their respective generators. Ideally, the range of a generator comprises of only the samples drawn from the distribution of the image or blur class. Constraining the solution (î, ˆk) to lie only in generator ranges, therefore, implicitly reduces the solution ambiguities inherent to the ill-posed blind deconvolution problem, and forces the solution to be members of classes I, and K. The minimization program in (2) can be equivalently formulated in the lower dimensional, latent representation space as follows (ẑ i, ẑ k ) = argmin z i R l,z k R m y G I (z i ) G K (z k ) 2. (3)

5 Blind Image Deconvolution using Deep Generative Priors 5 Fig. 2: Block diagram of proposed approach. z t i and z t k sampled from Gaussian normal distribution for alternating gradient descent are used to minimize the loss. The optimal pair (ẑ i, ẑ k ) which minimizes the residual loss are used to generate image and blur estimates (î, ˆk). This optimization program can be thought of as tweaking the latent representation vectors z i, and z k, (input to the generators G I, and G K, respectively) until these generators generate an image i and blur kernel k whose convolution comes as close to y as possible. The optimization program in (3) is obviously non-convex owing to the bilinear convolution operator, and non-linear deep generative models. We resort to an alternating gradient descent algorithm to find a local minima (ẑ i, ẑ k ). Importantly, the weights of the generators are always fixed as they enter into this algorithm as pre-trained models. At a given iteration, the algorithm fixes z i and takes a descent step in z k, and vice verse. The gradient step in each variable involves a forward and back propagation through the generators. Section 4.3 talks about the back propagation, and gives explicit gradient forms for descent in each z i and z k for this particular algorithm. The estimated deblurred image and the blur kernel are acquired by a forward pass of the solutions ẑ i and ẑ k through the generators G I and G K. Mathematically, (î, ˆk) = (G I (ẑ i ), G K (ẑ k )). 4 Image Deblurring Algorithm Our approach requires that we have pre-trained generative models G I and G K for classes I and K, respectively. As a result, we train GANs and VAEs on the clean images and blur kernels. We briefly recap the training process below.

6 6 Authors Suppressed Due to Excessive Length 4.1 Training the Generative Models A generative model 1 G will either be trained via adversarial learning [12] or variational inference [13]. Generative adversarial networks (GAN) learn the distribution p(i) of images in class I by playing an adversarial game. A discriminator network D learns to differentiate between true images sampled from p(i) and those produced by the generator network G, while G tries to fool the discriminator. The cost function describing the game is given by min max E p(i) log D(i) + E p(z) log(1 D(G(z))), G D where p(z) is the distribution of latent random variables z, and is usually defined to be a known and simple distribution such as p(z) = N (0, I). Variational autoencoders (VAE) learn the distribution p(i) by maximizing a lower bound on the log likelihood: log p(i) E q(z i) log p(i z) KL(q(z i) p(z)), where the second term on the right hand side is the Kullback-Leibler divergence between known distribution p(z), and q(z i). The distribution q(z i) is a proxy for the unknown p(z i). Under a rich enough function model q(z i), the lower bound is expected to be tight. The functional forms of q(z i), and p(i z) each are modelled via a deep network. The right hand side is maximized by tweaking the network parameters. The deep network p(i z) is the generative model that produces samples of p(i) from latent representation z. Generative model G I for the face images is trained using adversarial learning for visually better quality results. Each of the generative model G I for other image datasets, and G K for blur kernels are trained using variational inference framework above. 4.2 Naive Deblurring After the training in the last section, the weights of the generative models G I, and G K are fixed for the rest of the algorithm. To deblur an image y, a simplest possible strategy is to find an image closest to y in the range of the given generator G I of clean images. Mathematically, this amounts to solving the following optimization program argmin y G I (z i ), î = G I (ẑ i ) (4) z i R l Although non-convex, a local minima ẑ i can be achieved via gradient descent implemented using the back propagation algorithm. The recovered image î is obtained by a forward pass of ẑ i through the generative model G I. Expectedly, this approach fails to produce reasonable recovery results; see Figure 3. The reason being that this back projection approach completely ignores the knowledge of the forward blur model in (1). We now address this shortcoming by including the forward model and blur kernel in our proposed objective. 1 The discussion in this section applies to both G I, and G K, therefore, we ignore the subscripts.

7 Blind Image Deconvolution using Deep Generative Priors 7 Range Blurry Original Fig. 3: Naive Deblurring. We gradually increase blur size from left to right and demonstrate the failure of naive deblurring by finding the closest image in the range of the image generator (last row) to blurred image. 4.3 Alternating Gradient Descent via Back Propagation We discovered in the previous section that simply finding a clean image close to the blurred one in the range of the clean image generator G I is not good enough. A more natural and effective strategy is to instead find a pair consisting of a clean image and a blur kernel in the range of G I, and G K, respectively, whose convolution comes as close to the blurred image y as possible. As outlined in Section 3 that this amounts to solving the optimization program below argmin y G I (z i ) G K (z k ) 2, (5) z i R l,z k R m where is the convolution operator. To reduce clutter, we restrict ourself to circular convolution for the discussion in this section. Define an n n DFT matrix F as F [ω, t] = 1 n e j2πωt/n, 1 ω, t n, where F [ω, t] denotes the (ω, t)th entry of the Fourier matrix. Since the convolution operator reduces to multiplication in the Fourier domain, we can equivalently write the objective function above in the Fourier domain as argmin F y nf G I (z i ) F G K (z k ) 2, (6) z i R l,z k R m where the operator computes the pointwise product of the entries of the vectors. Finally, incorporating the fact that latent representation vectors z i, and z k can be equivalently assumed to be coming from standard Gaussian distributions in the both the adversarial learning and variational inference framework, outlined in Section 4.1, we can further add l 2 penalty terms on the latent representations. This brings us to our final optimization program for image deblurring argmin F y nf G I (z i ) F G K (z k ) 2 + γ z i 2 + λ z k 2, (7) z i R l,z k R m where λ, and γ are scalar free parameters. For brevity, we denote the objective by L(z i, z k ) = F y nf G I (z i ) F G K (z k ) 2 + γ z i 2 + λ z k 2.

8 8 Authors Suppressed Due to Excessive Length Algorithm 1 Alternating Gradient Descent Input: y, G I, G K and η Output: Estimates î and ˆk Initialize: z (0) i N (0, I), z (0) k N (0, I) for t = 0, 1, 2,... do z (t) i z (t+1) i z (t+1) k z (t) k - η zi L(z (t) i - η z k L(z(t) i end for î G I(ẑ i), ˆk G K(ẑ k ), z (t) k );, z (t) k ); As pointed out earlier, the objective function above is non-convex due to non-linearities introduced from pointwise multiplications and generative models, therefore, we resort to finding a local minima using an alternating gradient descent step in each of the unknowns z i, and z k. We begin by initializing each of the vectors z i, and z k by a sample from a standard Gaussian density. To circumvent non-convexity of the above program, we use different random initializations of latent variables and find minimizers using each instance pair (which we refer to as random restarts). From these possible solutions, we extract the final solution which best minimizes (6), which we define as residual loss. At a given step, we fix one of the two unknowns vectors z i, z k, and take a gradient step in the other. Algorithm 1 formally introduces the proposed alternating gradient descent scheme. We now compute the gradient expressions 2 for each of the variables z i, and z k. Start by defining residual r t at the tth iteration: Let Ġ K = r t := F G K (z t k) nf G I (z t i) F y, z k G K (z k ) and ĠI = z i G I (z i ). Then, it is easy to see that z t k L(z t i, z t k) = Ġ KF [ r t n F (G I (z t i )] + λz t k, z t i L(z t i, z t k) = nġ IF [ r t F G K (z t k )] + γz t i. For illustration, take the example of a two layer generator G I (z i ), which is simply G I (z i ) = relu(w 2 (relu(w 1 z i ))), where W 1 : R l1 l, and W 2 : R n l1 are the weight matrices at the first, and second layer, respectively. In this case ĠI = W 2,zi W1,zi where W 1,zi = diag(w 1 z i > 0)W 1 and W 2,zi = diag(w 2 W1,zi > 0)W 2. From this expression, it is clear that alternating 2 For a function f(x) of variable x = u + ιv, the Wirtinger derivatives of f(x) with respect to x, and x are defined as f x = 1 ( f 2 u ι f ), and v f x = 1 ( f 2 u + ι f ) v

9 Blind Image Deconvolution using Deep Generative Priors 9 Table 1: Architectures for VAEs used for Blur, MNIST and SVHN. Here, conv(m,n,s) represents convolutional layer with m filters of size n and stride s. Similarly, convt represent transposed convolution layer. Maxpool(n,m) represents a max pooling layer with stride m and pool size of n. Finally, fc(m) represents a fully connected layer of size m. The decoder is designed to be the mirror reflection of the encoder in each case. Model Architectures Model Encoder Decoder Blur VAE conv(20, 2 2, 1) relu maxpool(2 2, 2) conv(20, 2 2, 1) relu maxpool(2 2, 2) fc(50), fc(50) fc(50) z k z k fc(720) relu reshape upsample(2 2) convt(20, 2 2, 1) relu upsample(2 2) convt(20, 2 2, 1) relu convt(1, 2 2, 1) sigmoid MNIST VAE SVHN VAE conv(20, 3 3, 1) relu maxpool(2 2, 2) conv(20, 3 3, 1) relu maxpool(2 2, 2) fc(500) fc(100), fc(100) fc(100) z i conv(128, 2 2, 2) batch-norm relu conv(256, 2 2, 2) batch-norm relu conv(512, 2 2, 2) batch-norm relu fc(100), fc(100) fc(100) z i z i fc(500) relu reshape upsample(2 2) convt(20, 3 3, 1) relu upsample(2 2) convt(20, 3 3, 1) relu convt(1, 3 3, 1) sigmoid z i fc(8192) reshape convt(512, 2 2, 2) batch-norm relu convt(256, 2 2, 2) batch-norm relu convt(128, 2 2, 2) batch-norm relu conv(3, 1 1, 1) sigmoid gradient descent algorithm for the blind image deblurring requires alternate back propagation through the generators G I, and G K as illustrated in Figure 2. To update z t 1 i, we fix z t 1 k, compute the Fourier transform of a scaling of the residual vector r t, and back propagate it through the generator G I. Similar update strategy is employed for keeping zi t fixed. z t 1 k 5 Experiments and Results In this section we evaluate performance of our proposed algorithm on three datasets MNIST [22], SVHN [23], and CelebA [24]. We show its competitiveness against baseline deblurring approaches and demonstrate its robustness to extravagant blurs and noise. 5.1 Datasets and Generators Description Image Datasets and Generators: First dataset, MNIST, consists of grayscale images of handwritten digits with 60, 000 training and 10, 000 test examples. For MNIST, we trained a VAE with architecture defined in Table 1 with latent dimension size (z i ) 100, batch size of 32 and learning rate using adam optimizer with remaining parameters set to default values. Second dataset, SVHN, consists of real world images of house numbers obtained from google street view images. We used extra training set of SVHN containing relatively less complicated 531, 131 images of which 30, 000 were held out as test set. These images are rescaled to [0 1] before training a VAE mentioned in Table 1 with size of z i equal to 100, batch size of 1500 and learning rate of 10 5 using adam optimizer. Third dataset, CelebA contains more than 200, 000 RGB face images of size of different celebrities. We center crop CelebA images to and scale

10 10 Authors Suppressed Due to Excessive Length MNIST SVHN CelebA γ λ Steps (t) Step size γ λ Steps (t) Step size γ λ Steps (t) Step size e t e t Table 2: Parameters of proposed alternating gradient descent algorithm used in each experiment. them in range [ 1 1] before feeding them into deep convolutional generative adversarial network (DCGAN) [25] for training with default parameters. Model is trained by updating G I (z i ) twice and G K (z k ) once in each cycle to avoid fast convergence. Blur Datasets and Generators: We generate two blur datasets, A and B. Blur dataset A, modelled after camera shake blur, contains motion blurs of length 10 with uniformly varying angle between 0 to 360 degrees. Blur dataset B contains centered overlapping motion blurs (from dataset A) and Gaussian blurs. These Gaussian blurs have uniformly varying standard deviation between 0.5 and 1.5. For each dataset, we generate 80, 000 blurs and split them into 60, 000 training and 20, 000 test set. We train two separate VAEs with the same architecture on each dataset. Model architectures for blur VAEs are presented in Table 1. These VAEs are trained using Adam optimizer with size of latent dimension z k 50, batch size 5, and learning rate After training, the decoder part is extracted as the desired generator G K. To show robustness of proposed approach we also create a variant of dataset A, which we named as Extravagant Blurs, that have length of 30 instead of 10. For this dataset, we trained another VAE with the same architecture as before except for the last layer of the decoder: relu instead of sigmoid. These large blurs significantly degrade the images upto a point that they are not even recognizable. To recover sharp images from these blurred images, is a significant challenge. 5.2 Blind Image Deblurring Results In this section, we evaluate the performance of our algorithm on noisy blurred images, generated by convolving images i and blurs k from their respective test sets. We train generative models on training data of respective dataset (VAE for MNIST and SVHN and GAN for CelebA). All experiments are performed by adding 1% 3 Gaussian noise to blurry images unless stated otherwise. Parameters of our proposed alternating gradient descent algorithm are mentioned in Table 2 Since the deblurred image using our proposed algorithm is constrained to lie in the range of the generator G I, we introduce i range, the closest image in l 2 distance to i I lying in range of generator G I. This is found by using (4) with y = i. Baseline Methods: We choose dark prior deblurring (DP) [26] and convolutional neural network (CNN) [9] trained in an end-to-end setting for deblurring as baseline algorithms. DP makes prior assumption on sparsity of dark channel (prior based on the 3 For an image scaled between 0 and 1, Gaussian noise of 1% translates to Gaussian noise with standard deviation σ = 0.01 and mean µ = 0.

Blind Image Deconvolution using Deep Generative Priors 11 Range Ours CNN DP Blurry Original Fig. 4: Image deblurring results on SVHN, CelebA and MNIST.

It can be seen that the recovered image î is close to i range, which is the closest image in the generator Range to the original image.

11 Blind Image Deconvolution using Deep Generative Priors 11 Range Ours CNN DP Blurry Original Fig. 4: Image deblurring results on SVHN, CelebA and MNIST. Images from test set along with corresponding blur kernels are convolved to produce blurry images. Qualitatively our approach is better than Dark Prior (DP) and competitive against CNN. It can be seen that the recovered image î is close to i range, which is the closest image in the generator Range to the original image. statistics of clean images) of blurred image and has shown promising results. We use default parameters for all experiments. On the other hand CNN [9] demonstrates power of end-to-end deep convolutional neural networks and shows that they significantly outperform traditional prior based methods. We choose CNN as a proxy for the latest deblurring methods, where neural networks are trained in end-to-end fashion to directly deblur images. Through experiments, we highlight weakness of these deblurring methods. We trained and fine tuned this network on CelebA and SVHN, for each blur dataset, separately, with 150, 000 training images each and 1% Gaussian noise.

12 12 Authors Suppressed Due to Excessive Length CelebA SVHN CelebA (Blur size 30) Methods DP CNN Ours DP CNN Ours DP CNN Ours PSNR(dB) SSIM Table 3: Quantitative comparison of proposed approach with dark prior (DP) and CNN for CelebA and SVHN dataset with blur test set A and B. We randomly sample 40 images from test set of CelebA and SVHN, 20 are blurred with blur A and 20 with blur B. This table shows average performance of PSNR and SSIM on these images. Results and Discussion: 4 Qualitative and quantitative results of proposed approach along with baseline algorithms are shown in Figure 4 and Table 3, respectively for blur datasets A and B. We compare Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM)[28] of the proposed algorithm with these baseline methods. It can be seen that our approach is better than dark prior in terms of both PSNR and SSIM, and competitive against end-to-end trained CNN. Close resemblance of the recovered images with corresponding generator s range shows that in our approach the generator is not just hallucinating details, but finds the closest image in its range to the true image. In Table 3, there is a significant performance gap both in terms of PSNR and SSIM between our approach and CNN for CelebA when compared with SVHN in case of SVHN, the proposed approach performs just as well as CNN. This can be attributed to the range of generator G I for SVHN. Since, SVHN images are relatively simple compared to CelebA, its generator almost entirely spans the set I. This fact, allows the generator of SVHN to recover almost exact images thus showing excellent performance. Face images are much more complex due to which they may not always lie in it s generator range. This explains the greater source of error in case of CelebA as compared to SVHN. To provide evidence for this claim, we investigate the sources of error. Range Error Analysis: As explained in [14], three sources of error can exist: estimates ẑ i and ẑ k found by gradient descent may not be the optimal solutions true solution lies far away from the range of generators error due to noise. Because of random restarts, the first error is reduced considerably, whereas the second error may dominate if the true image does not lie near or in the range of generator. Here, we analyse how does this error affect the performance of our approach. For every tuple (i test, k test, y test) where i test and k test are the image and blur from test set, and y test being the corresponding blurred image, we generate another tuple (i range, k range, y range ), where i range is the closest image to i test in Range(G I ), k range is the closest kernel to k test in Range(G K ) and y range is corresponding blurred image. Using y test and y range, we use our approach to recover estimates of image and blur for each case. In the later case, since images already lie in generator range, the second error will be eliminated, but 4 Although we formally analyse our algorithm on DCGAN, but we show in Figure 1 that any pre-trained generative model that can generate high resolution images such as PG-GAN [27] can also be used with excellent qualitative results. This highlights the transferability of our algorithm, where only G I needs to be changed for different image sets with possibly different resolutions, but requires the same blur generator for deblurring. We do not report detailed analysis for PG-GAN due to computational limitations.

13 Blind Image Deconvolution using Deep Generative Priors 13 Test Reconstruction error e test = i test î Range Reconstruction error e range = i range î Improvement Ratio e test/e range Table 4: Range error analysis: We compare average (per pixel) test and range reconstruction error for a total 200 images with 50 images from each image and blur dataset. Improvement ratio indicates that the reconstruction error is less than half on average when image being deblurred lies in the range of the generator. (a) Original (b) Blurry (c) CNN (d) Ours (e) Range (a) 10% (b) 20% (c) 30% (d) 10% (e) 20% (f) 30% Fig. 5: Blind image deblurring with large blurs. Images from CelebA test set along with corresponding blur kernels are shown in (a). Corresponding blurry images (b) are recovered using CNN (c) and proposed approach (d) with the closest image in generator Range shown in (e). Fig. 6: Noise Analyis. We show our recovered images (second row) from noisy blurred images (first row) and compare our results with CNN (third row), by varying noise uniformly from 10% to 30%. this may not be the case for test set images. By comparing the reconstruction errors (per pixel) i test î with i range î where î is the recovered image in each case, we can quantify the performance improvement when images lie in range of their respective generators. From Table 4, it can be seen that reconstruction error is improved by a factor of 2.7, if images lie in the range of their respective generators. Robustness to Extravagant Blurs and Noise: Here, we show the effectiveness of proposed algorithm on large blurs and high noise. To demonstrate this, we use images from CelebA and blurs from Extravagant Blur test set. Dark Prior (DP) completely fails to estimate true image from these blurry images, so we do not report its quantitative results in Table 3. We retrained our previous CNN model on these extravagantly blurred images and report quantitative results in Table 3. These results show that in the regime of large blurs, our algorithm is expected to outperforms end-to-end trained model. These end-to-end trained models suffer on large blurs because they do not embed the forward blur model. Qualitatively, our results shown in Figure 5 are sharp and visually appealing as compared to CNN results. Our proposed algorithm performs surprisingly well in high noise regime as shown qualitatively in Figure 6 and quantitatively in Table 5. For comparison, we used the CNN trained on CelebA and SVHN with 1% Gaussian noise to show robustness of our

14 14 Authors Suppressed Due to Excessive Length CelebA SVHN Noise (%) PSNR(dB) CNN SSIM PSNR(dB) Ours SSIM Table 5: Performance of proposed approach at different noise levels. Quantitative comparison in terms of PSNR and SSIM with CNN. Our proposed method outperform CNN for all noise level with significant margin. approach compared to end-to-end trained models. Quantitative results are shown in Table 5. Our algorithm shows impressive performance by outperforming CNN by 5 8 db in term of PSNR. CNN based approach works only good for noise level on which it is trained (1% gaussian noise here) and completely fails on other noise levels as shown in Figure 6 while our approach give impressive results even with 50% noise levels Table Limitations Currently, our approach is limited by the range of generators G I (z i ) and G K (z k ). GANs although capable of generating sharp and realistic looking images, they still lack in their generative capabilities. A well trained generator with much greater range can recover exact images. Similarly gradient descent can get stuck in local minima as shown in Figure 7. We demonstrate failure cases in which case more random restarts can improve performance. Although we use 10 random restarts in all experiments, but this value is found experimentally and by no mean close to the optimum. Moreover, fine tuning parameters of the models and optimizers is also a challenge and need further investigation. (a) Range (b) 10 RR (c) 25 RR Fig. 7: Failure case of recovering reconstructed image on large blur dataset. As clear from figure increasing number of random restarts (RR) can improve reconstruction. 6 Conclusion and Future Work We leverage the power of deep generative priors to regularize ill-posed and non-linear inverse problem of blind image deblurring. We show that this regularized problem is

15 Blind Image Deconvolution using Deep Generative Priors 15 efficiently solved by simple alternating gradient descent algorithm. Through extensive experiments we show that our results outperform conventional engineered linear models and competitive against end-to-end deep learning approaches. Contrary to these deep learning techniques, our approach does not require expensive re-training of neural networks for different noise levels and is also robust to extravagant blurs. The initial success of deep generative priors in solving the blind image deblurring problem highlighted in this paper opens up exciting questions of similar strategy in other related ill-posed, non-linear inverse problems such as phase retrieval from magnitude only measurements of images; a problem of great interest in physics, and biomedical imaging community [29] References 1. Chan, T.F., Wong, C.K.: Total variation blind deconvolution. IEEE transactions on Image Processing 7(3) (1998) Fergus, R., Singh, B., Hertzmann, A., Roweis, S.T., Freeman, W.T.: Removing camera shake from a single photograph. In: ACM transactions on graphics (TOG). Volume 25., ACM (2006) Xu, L., Zheng, S., Jia, J.: Unnatural l0 sparse representation for natural image deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2013) Michaeli, T., Irani, M.: Blind deblurring using internal patch recurrence. In: European Conference on Computer Vision, Springer (2014) Ahmed, A., Recht, B., Romberg, J.: Blind deconvolution using convex programming. IEEE Transactions on Information Theory 60(3) (2014) Ren, W., Cao, X., Pan, J., Guo, X., Zuo, W., Yang, M.H.: Image deblurring via enhanced low-rank prior. IEEE Transactions on Image Processing 25(7) (2016) Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-laplacian priors. In: Advances in Neural Information Processing Systems. (2009) Schuler, C.J., Hirsch, M., Harmeling, S., Schölkopf, B.: Learning to deblur. IEEE transactions on pattern analysis and machine intelligence 38(7) (2016) Hradiš, M., Kotera, J., Zemcík, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: Proceedings of BMVC. Volume 10. (2015) 10. Chakrabarti, A.: A neural approach to blind motion deblurring. In: European Conference on Computer Vision, Springer (2016) Svoboda, P., Hradiš, M., Maršík, L., Zemcík, P.: Cnn for license plate motion deblurring. In: Image Processing (ICIP), 2016 IEEE International Conference on, IEEE (2016) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems. (2014) Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arxiv preprint arxiv: (2013) 14. Bora, A., Jalal, A., Price, E., Dimakis, A.G.: Compressed sensing using generative models. arxiv preprint arxiv: (2017) 15. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. arxiv preprint (2016)

16 16 Authors Suppressed Due to Excessive Length 16. Yeh, R.A., Chen, C., Lim, T.Y., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) Xu, X., Sun, D., Pan, J., Zhang, Y., Pfister, H., Yang, M.H.: Learning to super-resolve blurry face and text images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) Nimisha, T., Singh, A.K., Rajagopalan, A.: Blur-invariant deep learning for blind-deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. arxiv preprint arxiv: (2017) 20. Samangouei, P., Kabkab, M., Chellappa, R.: Defense-gan: Protecting classifiers against adversarial attacks using generative models. (2018) 21. Ilyas, A., Jalal, A., Asteri, E., Daskalakis, C., Dimakis, A.G.: The robust manifold defense: Adversarial training using generative models. arxiv preprint arxiv: (2017) 22. LeCun, Y.: The mnist database of handwritten digits. lecun. com/exdb/mnist/ (1998) 23. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning. Volume (2011) Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision. (2015) Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems. (2016) Pan, J., Sun, D., Pfister, H., Yang, M.H.: Blind image deblurring using dark channel prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2016) Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arxiv preprint arxiv: (2017) 28. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4) (2004) Jaganathan, K., Eldar, Y.C., Hassibi, B.: Phase retrieval: An overview of recent developments. arxiv preprint arxiv: (2015)

DCGANs for image super-resolution, denoising and debluring

DCGANs for image super-resolution, denoising and debluring Qiaojing Yan Stanford University Electrical Engineering qiaojing@stanford.edu Wei Wang Stanford University Electrical Engineering wwang23@stanford.edu