arxiv: v1 [cs.cv] 12 Jul 2018

Size: px

Start display at page:

Download "arxiv: v1 [cs.cv] 12 Jul 2018"

Candace Reed
5 years ago
Views:

1 Subsampled Turbulence Removal Network Wai Ho Chak, Chun Pong Lau, Lok Ming Lui arxiv: v1 [cs.cv] 12 Jul 2018 Abstract We present a deep-learning approach to restore a sequence of turbulence-distorted video frames from turbulent deformations and space-time varying blurs. Instead of requiring a massive training sample size in deep networks, we purpose a training strategy that is based on a new data augmentation method to model turbulence from a relatively small dataset. Then we introduce a subsampled method to enhance the restoration performance of the presented GAN model. The contributions of the paper is threefold: first, we introduce a simple but effective data augmentation algorithm to model the turbulence in real life for training in the deep network; Second, we firstly purpose the Wasserstein GAN combined with l 1 cost for successful restoration of turbulence-corrupted video sequence; Third, we combine the subsampling algorithm to filter out strongly corrupted frames to generate a video sequence with better quality. Keywords: turbulence, data augmentation, WGAN 1. Introduction The problem of image restoration from a sequence of frames under the atmospheric turbulence is challenging due to the dramatic downgrade in the image quality from the geometric distortions and the space-time varying blurs. Multiple factors such as temperature changes, air turbulent flow, densities of air particles, carbon dioxide level and humidity lead to the occurrence of several turbulence layers with various changes in refractive index [1] [2]. These factors addresses: whchak@math.cuhk.edu.hk (Wai Ho Chak), cplau@math.cuhk.edu.hk (Chun Pong Lau ), lmlui@math.cuhk.edu.hk (Lok Ming Lui)

2 together explain the higher chance of obtaining corrupted video sequences in locations where the variation among these factors is large. In practice, either technique in hardware-based adaptive optics [3] [4] or methods in image processing [5] [6] [7] [8] [9] are employed to remove the turbulence distortion in the images, but those prevailing models from either way can barely address to the majority of these factors. Due to the fact that atmospheric turbulence is complicated to be modeled, a deep learning approach which does not heavily require underlying assumptions is more reasonable to tackle the problem than models relying on certain assumptions on turbulence. We are motivated that geometric distortion can be removed and restore a good-quality image by using a generative model that does not explicitly take the above-mentioned factors into consideration. However, the unavailability of massive turbulence-distorted video frames disabled the use of deep learning approach to tackle the problem. In this paper, we introduce a simple but effective data augmentation method to make deep learning approach possible to restore turbulence-corrupted images from limited training data. The method models the real turbulence with different deformation and different extent of blurs in order to provide sufficient training data with limited data distorted from real turbulence online. Since the artificial turbulence is randomly generated with different strength of deformation and blurs, a variety of turbulence-distorted videos are produced from a single image. Although the performance of image restoration is commensurate with the training sample size, the size requirement is not too restrictive and demanding in order to achieve image restoration with the turbulence generating algorithm proposed by us. With the introduced data augmentation method, we can finally make use of deep networks to train the artificially generated data. We purpose to use subsampled Wasserstein Generative Adversarial Network (WGAN) with l 1 cost to remove the geometric distortions as well as blurring effects simultaneously. Normally geometric distortion is observed in a video sequence under atmospheric turbulence. Therefore, it is natural to take the whole frames from the video as 2

3 the input of the turbulence-removal network. Experimentally, we show that the restored image that is generated from a portion of the corrupted video sequence as the input is of much higher quality than a corrupted single frame. Using multiple frames as input is hence essential to obtain a clear image from being degraded under turbulence. WGAN is known for its effectiveness in producing a clear image from noises. Together with the l 1 cost applied to the network, important features of the images can be maintained even though they are corrupted. In the testing stage, we introduce the subsampling algorithm for fine-tuning. Usually, turbulence-distorted video consists of mildly distorted frames. The proposed subsampling method extracts those sharp and mildly distorted frames in order to achieve even better restoration result. We experimentally show that with the proposed subsample method, the performance of removing geometric distortions and deblurring of the degraded images is even better Contribution The main contribution of this paper is listed as follows: 1 The result of this paper demonstrates that our data augmentation method is effective in the way that a sufficiently large dataset is not required. 2 The WGAN model with l 1 loss is shown to achieve our objectives of removing geometric distortions and produce a clear image from turbulence degradation of a corrupted video sequence. 3 We have embedded subsampling method into the proposed network to obtain a subsample of the degraded video with better quality. The subsample can be passed into the generator to produce an image with higher quality. 2. Related Work The main task of restoring the corrupted video frames are the removal of geometric distortion and blur. Some related works are devoted to obtain image 3

4 fusion and a reference image for image registration in order to discard geometric distortion. The reference extraction method proposed by Meinhardt-Llopis and Michel [10] [11] has an underlying assumption that there is zero deformation between the original image and the distorted frames on average, but it may not align with our observation in the real video under turbulence degradation. Besides not being able to discard geometric distortions for highly corrupted video frames, the restored image remains blurry because of the temporal averaging. Another approach to resolve a clear reference image is the method of lucky frame which is the sharpest extracted video frame [12]. Even though the statistical proofs [13] showing a high probability of extracting video frames with sharp texture details given sufficient frames motivate this method, the chance of getting a frame that is entirely sharp everywhere is infrequent. As a result, the Lucky-Region method proposed by Aubaillyet al. [14] choose the sharpest patch from each frames patch-wisely and combine the patches afterward. Motivated by this patch-wise sharpness selection method, another approach introduced by Anantrasirichai et al.[15] suggest having a frame selection prior for registration. One of the drawbacks of this method is that the selection by sorting in only a single step engenders undesirable geometric difference between the reference image and the selected frames. Another drawback is the limitation of the assumption inherited in the objective function that the temporal mean of all the frames over the intensity can approximate the original uncorrupted image. Similarly, a subsampling method introduced by Roggemann [16] generate a temporal mean with a high signal-to-noise ratio of subsample selected from video frames according to the adaptive-optics systems. Robust Principle Component Analysis (RPCA) [17] is a recent approach to resolve the problem of atmospheric turbulence. Low-rank decomposition method proposed by Heet al.[18] divides the video sequence separately into low-rank and sparse parts. A variational approach introduced by Xie et al.[19] is applied to improve the initial reference image as the low-rank image that captures the texture information and discards geometric distortion but still looks blurry. However, registration may fail when there is a large deformation between 4

5 the observed video frames and the reference image. Another recent approach is the joint subsampling and reconstruction variational model proposed by Lau et al. [20]. Unlike the RPCA approach, there is no registration involved during the subsampling and reconstruction processes. By using the proposed energy model with various fidelity terms, the restoration performance is improved for various distortion strength of the atmospheric turbulence. Still, the fluctuating strength of turbulence and other unexplained factors cannot be captured by proposed energy model. The distorted image cannot be restored well from the video frame with high and fluctuating strength of turbulence. A data-driven approach that acquires prior knowledge from the existing turbulence-distorted video data, including the varying strength and unexplained factors, may be a genuine solution to remove geometric distortion and restore the image from atmospheric turbulence Generative Adversarial Networks Generative adversarial networks (GANs) firstly proposed by Goodfellow at al. [21] defines two separated competitors: the generator G θ and the discriminator D ξ. The generator is designed to produce samples from noise Z while a discriminator is designed to distinguish real sample y i and generated sample G θ (z i ). The main objective of the generator is to generate perceptually persuasive samples that are challenging to be discriminated by the real samples. The competition between the generator G and the discriminator D can be described by the minimax objective shown as follows: min max G D E [log D(x)] + E [log(1 D( x))] (1) x P r x Pg where P r is the data distribution and P g is the generated distribution given by x = G(z), z P (z), where z is sampled from a noise distribution. The advantage of GANs is the ability to generate clear samples with high perceptual quality. However, as described by Salimans et al. [22], there are undesirable issues such as vanishing gradients and mode collapse in the training. The difficulties can be explained by the fact that minimizing the objective function for 5

6 GANs is equivalent to minimizing the Jensen-Shannon divergence, which is locally saturated and results in vanishing gradients, between the data and model distributions. Later, Arjovsky et al. [23] addressed the gradient vanishing problem by introducing the weaker Wasserstain-1 distance W (P r, P g ) which gives clear gradients almost everywhere in the GAN model. The competition between the two networks is reformulated as our minimax optimization objective: min max G D D E [D(x)] E [D( x)] (2) x P r x Pg where D is the set of 1-Lipschitz functions such that D L 1. The original Lipschitz constraint enforcement proposed by Arjovsky at al. [23] is weight clipping to [ c, c]. Another approach proposed by Gulrajani et al. [24] is adding the gradient penality term λ E x P x [( x D( x) 2 1) 2 ] (3) The approach does not require hyperparameter tuning and is robust to the selection of the architecture of generator. In contrast to the conventional convolutional neural network, GANs can generate clearer images. WGAN-l 1 proposed by Kupyn et al. has been shown effective in image deblurring. 3. Method Overview 3.1. Data Augmentation for Turbulence The first step of the algorithm proposed is to synthesize sufficient training data distorted by turbulence. A large sample size is necessary for typical tasks using deep learning approach, but unfortunately there is limited turbulencedistorted data available and therefore deep learning approach with experimental results has not yet been proposed for turbulence removal. However, we introduce a new, simple but effective method for generating the training data from a few data for deep learning A single frame is transformed to a sequence of frames by selecting ( ) width height positions randomly. Then we generate a motion vector 6

7 Figure 1: The artificial generation of turbulence-distorted frames with different distortion strength S and fixed blur constant B = 1. field for each patch centered at the selected position, where each vector is sampled from a normal distribution, smoothened by a Gaussian kernel and entrywisely multiplied by a strength value for distortion. The motion vector field overall is stimulated by fusing the vector patches together wherever overlapping and is employed to wrap the original image with accumulated distortion for each overlapping patches. In addition, a Gaussian blur is added to the image to blur the distorted image 3.1. The data augmentation method to synthesize turbulence-distorted video frames is summarized in Algorithm 1. 7

8 Algorithm 1 Distortion and Blur Generation Parameters: M = number of iterations N = 32 - patch size µ - mean of the Guassian kernel σ - standard deviation of the Guassian kernel S - distortion strength, uniform from [0.1,0.4] B - blur constant, uniform from [0.1, 1] 1: procedure DistortBlur (Img, σ, N, M) 2: Create a Guassian kernel from CDF(0.2 rand 1, σ) 3: for i = 1 N do 4: x randi(width 2 N) + N 5: y randi(height 2 N) + N 6: u(x N : x + N, y N : y + N) u(x N : x + N, y N : y + N) + kernel randn S 7: v(x N : x + N, y N : y + N) v(x N : x + N, y N : y + N) + kernel randn S 8: Wrap the image with u, v vector fields 9: Blur the image by a Gaussian kernel with a blur constant B 10: return Distorted Video Frames 3.2. WGAN l 1 with randomized Multiframe Input The proposed turbulence removal network (TRN) is multi-frame subsampled WGAN with the l 1 cost incorporated into the model. Multiframe input is a component of TRN to absorb sufficient information on the turbulence deformation of the original image Randomized Multiframe Input The input in our network is a turbulence-distorted multiframe I originated from a clear image I C of size r s. The conventional input for GANs is a noise vector z R N randomly generated according to the normal distri- 8

9 bution. Then the noise vector z transforms into the desired output through the generator. The architecture of DeBlurGan [25] requires blurred image as an input and produce a de-blurred image. The frames I (1), I(2),..., I(n) I in turbulence-distorted videos are mostly grayscale. Blur is one of the consequences of turbulence observed in the frames. However, a single frame from turbulence-distorted videos as an input is experimentally shown to be ineffective in recovering the original undistorted image. Therefore, using the original architecture from DeBlurGan is insufficient to remove undesirable effects such as geometric distortions and blur from turbulence. The improved version of our new architecture is to include a subsampled multiframe as the input. Instead of using the whole frames as the input, we have a subsampled method which is involved in the testing stage and is effective to produce a clearer image without subsampling. In the training stage, we randomly select m = 20 frames in the whole video in grayscale as the noise term for the GAN model Image Subsampling Given a turbulence-distorted video frames I = (I (1), I(2),..., I(n) ), we introduce a fast and efficient algorithm from a variational model to get an optimal subsample set J of sharp and mildly distorted images. J = {i 1,, i m } is the index set of the subsample set J, where m = J is the number of chosen video frames in the subsample. Simultaneously we obtain a reference image I R for the subsample set J. The variational model is formulated in the following form: E(I R, J) = 1 ( ) F(I R, I (k) J ) + λq(i(k) ) τr(j) (4) k J 2 2 The fidelity term F is the discrepancy term between the reference image and the video frames. In our model, we define F(I R, I (k) ) = IR I (k) for measuring the l 2 distance between the reference image I R and the subsampled video {I (k) } k J. The quality term Q(I (k) ) for each video frame I(k) is based on the 9

10 normalized version of I k 1 : Q(I (k) ) = max 1 i n I(i) 1 I(k) 1 max 1 i n I(i) 1 min 1 i n I(i) 1 (5) The term I (k) is the convolution of I(k) with the Laplacian kernel specifically highlighting the edges and features of objects in the image I (k). The sharper the image I (k), the higher the magnitude of I(k). As a consequence, the normalized quality measure Q(I (k) ) is smaller when I(k) is sharp. The term λ in the energy model E(I R, J) is a positive constant to quantify the importance of sharpness of the frame I (k). The regularization term R is the concave increasing function 1 e ρ J, where ρ > 0 is a constant to quantify the importance of the number of selected frames. The function is chosen in order to acquire more information from additional video frames, whereas the effect on the quality of the reference image I R is reduced with a marginal increase in the subsample size. The detailed formulation of the variational model is described in [20]. 10

11 Algorithm 2 Image Subsampling Parameters: λ - sharpness parameter τ - subsample size parameter ρ - subsample decay rate parameter 1: procedure Image Subsampling (I = (I (1) n i=1 I(i), I(2),, I(n) ), λ, τ, ρ) 2: Compute IR 0 = 1 n 3: Compute the quality measure Q(I (k) ) for each video frame {I(k) }n k=1 4: repeat 5: Given J t 1, I t 1 R. Fixing It 1 R, solve ( J t 1 ) ) = arg min IR t I (k) 2 + λq(i (k) J J ) τ (1 e ρ J 2 6: Compute E 1,k = E 1,k in ascending order. k J IR t I(k) λq(i (k) ) for each k and arrange 7: Compute the sum S j for each j and arrange S j in ascending order. 8: J t {k 1, k 2,, k j1 } 9: Fixing J t, solve I t = arg min I k J t I(k) 10: IR t 1 J t 11: until E1 t 1 E1 t ɛ ( 1 ) J t I R I (k) 2 2 k J t 12: return subsampled image sequence {I (k) } U-Net Architecture for Generator Network The generator network G we use is the U-Net [26], which consists of five types of layers: convolutional layer, deconvolutional layer, max-pooling layer, Randomized Leaky ReLU activation layer (α = 0.2) and instance normalization layer. U-Net is known to involve a contracting path for contextual preservation and a symmetric expanding path for localization, and hence was particularly 11

12 successful in image segmentation, denoising and super-resolution. Therefore, we choose U-net as the main architecture for our generator network G. The subsampled turbulence-distorted multiframe passes through 7 blocks of convolutional layers and 6 blocks of deconvolutional layers to generate a clear contains convolutional layers, fol- image. The first 7 blocks B (1) C, B(2) C,..., B(7) C lowed by the 6 remaining blocks B (1) layers. Each block B (i) C D, B(2) D,..., B(6) D consisting of deconvolutional contains convolutional layers, non-linear activation layers and instance normalization layers. The temporal features extracted in each block are down-sampled by max-polling except the features for the last block B (7) C. The features in B(6) the block B (1) D C and B(7) C are concatenated before passing through in order to retain the deep features without too much information loss. The feature collected in the first block B (1) D the feature from the block B (5) C is then concatenated with to output the feature in the second block B(2) D. Repeating the process, we obtain a clear image I C which is of the same size as the original undistorted image. The generator network is not pre-trained since the input of the architecture is different from the standard one involving image with three feature channels as input. Instead, we have a subsample of turbulence-distorted video frames J which are randomly chosen in the training stage and algorithmically selected by the method introduced in the subsection The generator network G is trained after the critic network D has trained multiple times till optimal to give the clear image I = G(I (i1),..., I(im) ). The loss function which has not yet been proposed for removing geometric distortion and blurs in video frames under turbulence for the generator is defined by L G = N D(I ) + γ N I C I 1 (6) n=1 The first term in the loss function L G is the adversarial loss which encourages an indistinguishable solution from the real one. In order to retain the textures inherited from the turbulence-distorted video frames, we further incorporate the l 1 loss into the loss function L G. Minimization under pixel-wise error terms such as the l 1 and l 2 function are insufficient to produce a clear image [27]. 12

We employ `1 loss instead of the `2 loss in our loss function LG to make the image much less blurry.

13 Instead, these error terms can only result in the non-smooth image with several black-spots and blurry image respectively. Incorporating the pixel-wise error terms with the adversarial loss can avoid the above drawbacks while producing a perceptually undistorted image from geometric distortion. We employ `1 loss instead of the `2 loss in our loss function LG to make the image much less blurry. As a consequence, combining the two terms in the loss function LG removes the geometric-distortion and undesirable artifacts such as blur. Figure 2: The generator network G has the U-net architecture, and the critic network D is the conventional convolutional neural network. The subsampled frames are concatenated before passing through the generator network Critic Network The critic network D is a deep CNN involving convolutional layers, fullyconnected layer, ReLU activation layer and instance normalization layer. We 13

14 denote the first 6 convolutional layers by L (1), L (2),..., L (6) and the last fully connected layer by L (7). The critic values D G(I )) and D I C are passed into the critic network to output the Wasserstain-1 distance max D L 1 E [D(x)] E [D( x)] (7) x P r x Pg The critic network D is trained till optimal before updating the generator network G. The loss function for the critic network D in the training process is provided as follows: ( D ( ) 2 2 L D = D(G(I )) D(G(I C )) + λ αi C + (1 α)i 1), (8) where α is a randomly generated number from uniform distribution U[0, 1]. Since there is no pre-trained model involved in the generator network G, the whole training takes a longer time and the loss blows up. In order to further enforce the 1-Lipschitz assumption in the critic network D, we further impose the weight constraint in the critic network D. The weights in the critic network D are clipped in the interval [ c, c]. Together with this weight constraint, the training becomes more stable. 4. Experiments 4.1. Dataset The dataset for training is collected from Flicker. It consists of 2000 building images and 1000 chimney images. The test dataset consists of 400 building images, collected from Flicker and the CUHK. All the images collected are resized to 256. The collected dataset is synthetically deformed by our data augmentation algorithm. Each image is deformed to produce 100 deformed video frames. Therefore, the whole dataset is enlarged by a factor of 100. There is a limited amount of real turbulence distorted videos available. Hence we are not able to use these videos for training. However, since it is more interesting to know the performance of restoration for real turbulence distorted 14

video frames, we increment two test sample, a building sample and a chimney sample, under real turbulence. 4.2.

15 video frames, we increment two test sample, a building sample and a chimney sample, under real turbulence Training Details The experiments were mainly conducted in PyTorch with a CUDA-enabled GPU. The data augmentation is performed in Matlab before we conduct the deep learning experiment. The distortion strength value and the blurring index are randomly sample from [0.1, 0.4] and [0.1, 1] respectively. We use Adam solver for gradient descent with learning rate of 10 4, β 1 = 0.5 and β 2 = 0.99 for both the generator G θ and the critic D ξ. We set 3 gradient descent steps for D ξ and then 1 step for G θ. We also apply instance normalization and dropout to improve the training. In addition to the gradient penalty term, we enforce the parameters ξ in the range [ 0.01, 0.01]. For each epoch, we train both the network with batch size of 1, and set λ = 10, γ = Furthermore, we randomly select 20 frames from the video sequence as our input. The whole training process for 40 epochs takes around 3 days. Figure 3: The training starting from the 1st epoch (left) to the 9th epoch (right). Each displayed image from its left image, except the first one, are generated for 1-2 epochs. The training performance is gradually better. The restoration mainly focuses on geometric distortion in the first few epochs and then focuses on texture preserving and deblurring in the following epochs. 15

dataset with the artificial synthesis of turbulence.

The right image is a restored image by using the WGAN-l 1 network.

16 5. Result 5.1. Result from Artificial Turbulence using WGAN-L1 The following left images come from the test dataset with the artificial synthesis of turbulence. The left image is a frame from a turbulence distorted video. The right image is a restored image by using the WGAN-l 1 network. (a) Observed frame (b) Restored frame by TRN Figure 4: The restoration performance using the proposed method on the artificial building data. The result demonstrates that the network can remove the geometric distortion and blur from the artificially turbulence-distorted video sequence for the buildings. 16

(a) Observed frame (b) Restored frame by TRN Figure 5: The

geometric distortion and blur from the artificially

(a) Observed frame (b) Restored frame by TRN Figure 6: The

17 (a) Observed frame (b) Restored frame by TRN Figure 5: The restoration performance using the proposed method on the artificial building data. The result demonstrates that the network can remove the geometric distortion and blur from the artificially turbulence-distorted video sequence for the buildings. (a) Observed frame (b) Restored frame by TRN Figure 6: The restoration performance using the proposed method on the artificial chimney data. The restoration performance is as good as the building data. 17

(a) Observed frame (b) Restored frame by TRN Figure 7: The restoration

The restoration performance is as good as the building data. 5.2.

image by using IRIS model [20] together with subsampling method.

more examples show a better image quality with the proposed method.

1 22.2 24.5 PSNR (TRN) 22.0 25.2 26.9 21.4 23.6 21.3 22.8 25.

Restoration from Artificial Turbulence from the LBS deformation Instead of

18 (a) Observed frame (b) Restored frame by TRN Figure 7: The restoration performance using the proposed method on the artificial chimney data. The restoration performance is as good as the building data Result from Artificial Turbulence using IRIS The right image is a restored image by using IRIS model [20] together with subsampling method. The result of the proposed method is comparable to the recent approach, and more examples show a better image quality with the proposed method. (a) (b) (c) (d) (e) (f) (g) (h) PSNR (IRIS) PSNR (TRN) SSIM (IRIS) SSIM (TRN) Restoration from Artificial Turbulence from the LBS deformation Instead of providing an impression that the network can only learn to restore a turbulence-distorted video sequence from the proposed generator, we introduce another turbulence generator that can test the performance of the proposed method. 18

(a) (b) (c) (d) (e) (f) (g) (h) Figure 8: The restoration performance using the IRIS method on the artificial building data.

In the new turbulence generator, we randomly select a position from the deformation field.

The method we use to generate artificial turbulence is deformation using Laplace-Beltrami solver (LBS) [28].

6+ɛ 1 ) cos (4+ɛ 2 )πx 2)] [ ( + (0.6+ɛ 3 ) sin (6+ɛ 4 )πy 2) ( cos (8+ɛ 5 )πxy) ] i, where ɛ i are numbers randomly chosen in the range [0, 0.

19 (a) (b) (c) (d) (e) (f) (g) (h) Figure 8: The restoration performance using the IRIS method on the artificial building data. The PSNR and SSIM measures favor more on the proposed method. In the new turbulence generator, we randomly select a position from the deformation field. A patch-wise triangular mesh is formed with the chosen position as the center. The method we use to generate artificial turbulence is deformation using Laplace-Beltrami solver (LBS) [28]. We purpose to assign the Beltrami coefficient µ which is a measure of nonconformality on each face vertex as follows: [ ( µ = (0.6+ɛ 1 ) cos (4+ɛ 2 )πx 2)] [ ( + (0.6+ɛ 3 ) sin (6+ɛ 4 )πy 2) ( cos (8+ɛ 5 )πxy) ] i, where ɛ i are numbers randomly chosen in the range [0, 0.3] for i = 1, 2 and [ 1, 1] for i = 3, 4, 5. Then we obtain the deformation field by using the LBS solver, and wrap the image. We also apply blurs to the image. The effect of LBS deformation is displayed on the top deformed images. The bottom restored images are produced by the network. The performance of the result shows that the network does not only learn about the deformation of the proposed generator, but the generator is also truly simple and effective. 19

20 (a) Observed frame (b) Restored frame by TRN Figure 9: The restoration performance on the artificial building data deformed by LBS deformation. The geometric distortion is clearly removed even with a different generator. The turbulence removal network can indeed restore the corrupted video sequence besides the one produced by our proposed generator. (a) Observed frame (b) Restored frame by TRN Figure 10: The restoration performance on the artificial building data deformed by LBS deformation. The geometric distortion is clearly removed even with a different generator. The turbulence removal network can indeed restore the corrupted video sequence besides the one produced by our proposed generator. 20

21 (a) Observed frame (b) Restored frame by TRN Figure 11: The restoration performance using the proposed method on the artificial chimney data. The restoration performance is as good as the building data. (a) Observed frame (b) Restored frame by TRN Figure 12: The restoration performance using the proposed method on the artificial chimney data. The restoration performance is as good as the building data. 21

5.4. Restoration from Real Turbulence using WGAN-l 1 In order to further convince that the generator can model real turbulence, we use the existing real turbulence-distorted video sequence without

(a) Observed Frame (b) Single Frame (c) Multi-frame Figure 13: The restoration performance using multi-frame is better than barely using a single frame.

22 5.4. Restoration from Real Turbulence using WGAN-l 1 In order to further convince that the generator can model real turbulence, we use the existing real turbulence-distorted video sequence without ground-truth available online for testing. Fig.13 shows that it is much challenging to restore a single frame. As a result, using multi-frame as input in the WGAN-L1 network is essential. Fig.13 demonstrates a significant improvement when multi-frame is employed. (a) Observed Frame (b) Single Frame (c) Multi-frame Figure 13: The restoration performance using multi-frame is better than barely using a single frame. The network is capable of removing geometric distortion and blur from the real turbulence-distorted video frames without subsampling Restoration from Real Turbulence with Subsampling While we are able to improve the image quality from the distorted video frames, Fig. 14 demonstrates that with subsampling method, the restoration performance is even better. The network can restore a mildly distorted video sequence well after subsampling. 22

Conclusion We introduced a turbulence removal network which is a Generative Adversarial Network (GAN) incorporated with l 1 objective function to discard distortion while removing blur from the

23 (a) Subsampled Frame (b) Restored Frame (a) Subsampled Frame (b) Restored Frame Figure 14: The subsampled video sequence is mildly distorted. With a higher-quality input, the subsampled output achieves better restoration performance. 6. Conclusion We introduced a turbulence removal network which is a Generative Adversarial Network (GAN) incorporated with l 1 objective function to discard distortion while removing blur from the restored image simultaneously. Although there is limited available data corrupted by real turbulence, we proposed a data augmentation method to synthetically generate turbulence which can model the real turbulence, as reflected from the performance of WGAN-L1 network. Unlike other methods requiring subsampling, the performance has been already of good quality without subsampling of the frames in a corrupted video sequence, even with a larger deformation under turbulence or complicated textures of the original image. However, we demonstrated that with subsampling method, the restoration performance can even be enhanced. As a consequence, the prob- 23

24 lem of geometric distortion and blurring of images under turbulence can be effectively resolved by our proposed methods. References References [1] R. Hufnagel, N. Stanley, Modulation transfer function associated with image transmission through turbulent media, JOSA 54 (1) (1964) [2] M. C. Roggemann, B. M. Welsh, B. R. Hunt, Imaging through turbulence, CRC press, [3] J. E. Pearson, Atmospheric turbulence compensation using coherent optical adaptive techniques, Applied optics 15 (3) (1976) [4] R. Tyson, Principles of adaptive optics, CRC press, [5] M. Shimizu, S. Yoshimura, M. Tanaka, M. Okutomi, Super-resolution from image sequence under influence of hot-air optical turbulence, in: Computer Vision and Pattern Recognition, CVPR IEEE Conference on, IEEE, 2008, pp [6] S. M. Seitz, S. Baker, Filter flow, in: Computer Vision, 2009 IEEE 12th International Conference on, IEEE, 2009, pp [7] D. Li, R. M. Mersereau, S. Simske, Atmospheric turbulence-degraded image restoration using principal components analysis, IEEE Geoscience and Remote Sensing Letters 4 (3) (2007) [8] M. A. Vorontsov, Parallel image processing based on an evolution equation with anisotropic gain: integrated optoelectronic architectures, JOSA A 16 (7) (1999) [9] M. Hirsch, S. Sra, B. Schölkopf, S. Harmeling, Efficient filter flow for spacevariant multiframe blind deconvolution, in: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, 2010, pp

25 [10] E. Meinhardt-Llopis, M. Micheli, Implementation of the centroid method for the correction of turbulence, Image Processing On Line 4 (2014) [11] M. Micheli, Y. Lou, S. Soatto, A. L. Bertozzi, A linear systems approach to imaging through turbulence, Journal of mathematical imaging and vision 48 (1) (2014) [12] M. A. Vorontsov, G. W. Carhart, Anisoplanatic imaging through turbulent media: image recovery by local information fusion from a set of shortexposure images, JOSA A 18 (6) (2001) [13] D. L. Fried, Probability of getting a lucky short-exposure image through turbulence, JOSA 68 (12) (1978) [14] M. Aubailly, M. A. Vorontsov, G. W. Carhart, M. T. Valley, Automated video enhancement from a stream of atmospherically-distorted images: the lucky-region fusion approach, in: Atmospheric Optics: Models, Measurements, and Target-in-the-Loop Propagation III, Vol. 7463, International Society for Optics and Photonics, 2009, p C. [15] N. Anantrasirichai, A. Achim, N. G. Kingsbury, D. R. Bull, Atmospheric turbulence mitigation using complex wavelet-based fusion, IEEE Transactions on Image Processing 22 (6) (2013) [16] M. C. Roggemann, C. A. Stoudt, B. M. Welsh, Image-spectrum signal-tonoise-ratio improvements by statistical frame selection for adaptive-optics imaging through atmospheric turbulence, Optical Engineering 33 (10) (1994) [17] E. J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis?, Journal of the ACM (JACM) 58 (3) (2011) 11. [18] R. He, Z. Wang, Y. Fan, D. Fengg, Atmospheric turbulence mitigation based on turbulence extraction, in: Acoustics, Speech and Signal Process- 25

26 ing (ICASSP), 2016 IEEE International Conference on, IEEE, 2016, pp [19] Y. Xie, W. Zhang, D. Tao, W. Hu, Y. Qu, H. Wang, Removing turbulence effect via hybrid total variation and deformation-guided kernel regression, IEEE Transactions on Image Processing 25 (10) (2016) [20] C. P. Lau, Y. H. Lai, L. M. Lui, Variational models for joint subsampling and reconstruction of turbulence-degraded images, arxiv preprint arxiv: [21] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural information processing systems, 2014, pp [22] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training gans, in: Advances in Neural Information Processing Systems, 2016, pp [23] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein gan, arxiv preprint arxiv: [24] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. C. Courville, Improved training of wasserstein gans, in: Advances in Neural Information Processing Systems, 2017, pp [25] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, J. Matas, Deblurgan: Blind motion deblurring using conditional adversarial networks, arxiv preprint arxiv: [26] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp

27 [27] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., Photo-realistic single image super-resolution using a generative adversarial network, arxiv preprint. [28] K. C. Lam, L. M. Lui, Landmark-and intensity-based registration with large deformations via quasi-conformal maps, SIAM Journal on Imaging Sciences 7 (4) (2014)

Removing Atmospheric Turbulence

Removing Atmospheric Turbulence Xiang Zhu, Peyman Milanfar EE Department University of California, Santa Cruz SIAM Imaging Science, May 20 th, 2012 1 What is the Problem? time 2 Atmospheric Turbulence