Estimation of Ambient Light and Transmission Map with Common Convolutional Architecture

Size: px

Start display at page:

Download "Estimation of Ambient Light and Transmission Map with Common Convolutional Architecture"

Nora Hudson
5 years ago
Views:

1 Estimation of Ambient ight and Transmission Map with Common Convolutional Architecture Young-Sik Shin, Younggun Cho, Gaurav Pandey, Ayoung Kim Department of Civil and Environmental Engineering, KAIST, S. Korea {youngsik.shin, yg.cho, Department of Electrical Engineering, IIT, Kanpur, India Abstract This paper presents a method for effective ambient light and transmission estimation in underwater images using a common olutional network architecture. The estimated ambient light and the transmission map are used to dehaze the underwater images. Dehazing underwater images is especially challenging due to the unknown and significantly varying ambient light in underwater environments. Unlike common dehazing methods, the proposed method is capable of estimating ambient light along with the transmission map thereby improving the reconstruction quality of the dehazed images. We evaluate the dehazing performance of the proposed method on real underwater images and also compare our method to current state-of-the-art techniques. I. INTRODUCTION Capturing high-resolution colored images from underwater environments has many applications in oceans engineering. A good quality image from the deep sea can be very useful for scientists studying various underwater phenomena. Despite significant advancements in camera technology, high-quality underwater image acquisition remains an unsolved problem. The scattering of light from water particles along with the attenuation and change in the color of different wavelengths of ambient light (including external light sources) cause a hazing effect in captured underwater images, as is shown in Fig. 1(a). This hazing effect needs to be removed from the images so that a clear picture of the underwater scene can be visualized. Several methods for haze removal from prior information have been proposed in the past [1] [6]. Schechner [1] tried to use prior information from multiple images taken at different environmental conditions and at different degrees of polarization. Narasimhan [2] used the available depth information to enhance the dehazing performance. Fattal [3] presented the first single image dehazing technique using independent component analysis (ICA) to decorrelate the transmission and surface shading. This technique relies on the assumption that the transmission map and the surface shading factor are locally uncorrelated. He et al. [4] proposed a haze removal method using dark channel prior (DCP). This approach uses a strong prior that at least one of the color channels has low intensity in haze-free images. Zhu et al. [5] proposed color attenuation prior (CAP), which uses the fact that the saturation of hazypixels in an image becomes much lower than that of the haze-free pixels. Carlevaris-Bianco et al. [6] used the strong difference between the red color channel and other channels to estimate the depth of scene from a single underwater image. (a) Original Image. (b) Dehazed Image. Fig. 1. Haze removal on underwater images. (a) Original hazed image. (b) the resulting dehazed image. Recently, olutional neural network (CNN) presented promising solutions to many vision tasks [7] [12] including dehazing [13], [14]. Jiaming Mai et al. proposed a back propagation neural network (BPNN) model to estimate the transmission map. Cai et al. [14] proposed a CNN architecture called DehazeNet for the estimation of the transmission map. A hazy image is provided as input to the olutional architecture, and a regression model is learned to predict the transmission map. This transmission map is then used to remove haze using an atmospheric scattering model. This is the work that is the most related to ours; however, the proposed method is different in the sense that we also estimate ambient light in addition to the transmission map for a better reconstruction of the hazy image. Image dehazing is an effective approach to increase the visibility and recover the real radiance of the hazy image. It should be noted that the ambient light in underwater environments is significantly biased, and estimating it accurately improves many underwater vision applications. However, despite the utility of correctly estimating ambient light, it is usually arbitrarily selected from the brightest or the median pixel within the lower estimated transmission [4] [6], [15]. Therefore, in this work we focus on the fast and effective estimation of both ambient light and the transmission map. We propose a common olution architecture to simultaneously estimate ambient light and the transmission map for visibility enhancement of the scene in an image degraded by an underwater environment. The rest of the paper is organized as follows. In section II, we describe the atmospheric scattering model used in

3x3 5x5 3x3 R e U 3x3 R e U P o o c o n v R e U Hazy image 7x7 maxout Transmission map or Ambient ight Multi-scale fusion stage Multi-scale feature extraction stage Nonlinear regression Fig. 2.

2 3x3 5x5 3x3 R e U 3x3 R e U P o o c o n v R e U Hazy image 7x7 maxout Transmission map or Ambient ight Multi-scale fusion stage Multi-scale feature extraction stage Nonlinear regression Fig. 2. The overall olutional architecture. The network contains three stages: multi-scale fusion, feature extraction and nonlinear regression. this work. Section III describes the proposed olutional architecture. Section IV presents the results from simulated and real data captured underwater. In section V, we present our concluding remarks. II. ATMOSPHERIC SCATTERING MODE In this paper, we adopt the haze model as described in [4], which considers the hazed image as a weighted sum of the haze-free image J and ambient light A. For a pixel (u, v), hazed pixel value I(u, v) is modeled as shown below I(u, v) = J(u, v)t(u, v) + A(1 t(u, v)), (1) where I is the observed haze image, J is the scene radiance, A is the global atmospheric light, and t is the transmission that reaches the camera without scattering. The transmission value t(u, v) at any pixel exponentially decreases with respect to the distance of the reached light t(u, v) = e βd(u,v), (2) where d(u, v) is the depth of the scene point, and β is the attenuation coefficient of the medium. If we resolve the transmission map t and the global atmospheric light A from a given hazed image, the original scene radiance J can be recovered from the model given in (1). III. CONVOUTIONA NEURA NETWORK (CNN) FOR AMBIENT IGHT AND TRANSMISSION ESTIMATION A. Model Architecture The proposed CNN architecture is composed of three stages of network propagation as shown in Fig. 2. The first stage of the network is a multi-scale fusion stage inspired from [16]. In this stage, we use an element-wise summation of pixels for dimensionality reduction. This allows us to generate more feature maps in each layer, thereby improving the training accuracy. Since we are estimating both ambient light and the transmission map from the same architecture, we need to fix the value of one while training for the other variable. Using more feature maps reduces the uncertainty of unknown variables (e.g. transmission in ambient light architecture learning and vice versa). The final stage consists of a nonlinear regression layer for ambient light and transmission map estimation. A similar idea was recently proposed in [14], which used a multi-scale mapping layer and maxout operation for hazerelevant feature extraction. However, our model is different from them as we estimate both ambient light and the transmission map from the same network architecture. 1) Muti-scale fusion: The first stage in the architecture is the multi-scale fusion layer which has been widely used for image enhancement including single image dehazing [14], [16], [17]. We use 3 parallel olutional layers, where each olutional layer has a filter of the size [3 3 32], [5 5 32] and [7 7 32]. We choose more number of feature maps than DehazeNet to reduce the uncertainty in the unknown variables. Moreover, at the end of this stage, we perform an element-wise summation operation whereas DehazeNet only stacks up the multi-scale layers. The summation of multi-scale layers helps to reduce the computational complexity in the later stages. 2) Feature extraction: To handle the ill-posed condition of single image dehazing, previous methods have assumed various features that are closely related with the properties of hazy images. For example, dark channel, hue disparity and rgb channel disparity are utilized as haze-relevant features [4] [6]. Inspired by the previous methods, the second stage is designed to extract haze-relevant features. This stage consist of a maxout unit, two olutional layers with a Rectified inear Unit (ReU) activation function and a max-pooling layer. The maxout unit [18] is selected to find features in the depth direction of input data. After the maxout unit, we use two olution layers with a filter of size [3 3 32]. astly, the max-pooling layer is chosen to get the spatial in feature. In general, max-pooling layers are used to overcome local sensitivity and to reduce the resolution of feature maps in entional CNN. In contrast with this, we densely apply the operation to prevent loss of resolution and it achieve the goal that is the use of CNN for image restoration. 3) Nonlinear regression: The last stage is the non-linear regression layer that performs the estimation of the transmission and ambient light. The olutional layer that is used in

3 (a) Original Image. (b) Transmission map. (c) Ambient light. (d) Dehazed Image. Fig. 3. A process of haze removal on underwater images. (a) the original image, (b) the transmission map and (c) the estimated ambient light. Finally, the dehazed image is recovered in (d). this stage consists of a single filter of the size [3 3 32]. We have also added the widely used ReU layer after every olution layer to avoid any problems of slow ergence and local minima during the training phase [7], [1], [11]). B. Training of CNN 1) Training data: Training of CNN requires a pair of hazy patches and corresponding haze-free information (e.g. transmission map and ambient light). Practically, it is very difficult to obtain such a training dataset via experiments. Therefore, we use the haze model equation in (1) and synthesize hazed patches from haze-free image patches to train our CNN architecture. We use two publicly available datasets, IC-NUIM [19] and SUN database [2], for training. We apply random transmission t (, 1) and random ambient light A (, 1) on small haze-free image patches, assuming that transmission and ambient light are locally constant on small image patches. Note that we use random ambient light for underwater images, which is generally a valid assumption for underwater environments. Moreover, the dataset generated in this manner enables the network to estimate more accurate transmission on hazy images having color distortions. This way, we generate a large number of hazy image patches from haze-free image patches with random transmission and ambient light. This training dataset is used to learn the threestage CNN architecture described above. 2) Training Method: In the proposed model, we use a supervised learning mechanism between hazy image patches and label data (such as transmission value or ambient light value). Filter weights in the model are learned by minimizing the loss function. Given a pair of hazy patches from the above method and their corresponding label, we use the general squared error (MSE) as the loss function, (Θ) = 1 N N F (p i ; Θ) l i, (3) i=1 where p i is the input hazy patches, l i is the label and Θ is the filter weights. We employ the widely-used stochastic gradient descent (SGD) algorithm to train our model. C. Balanced Scene Radiance Recovery When transmission t(u, v) and atmospheric light A are obtained, original scene radiance J can be recovered from the atmospheric scattering model in eq. (1). It was entionally recovered from the inverse atmospheric scattering model as shown below J(u, v) = I(u, v) A + A. (4) max(t(u, v), t ) However, this model cannot achieve the recovery of the original scene radiance in an underwater environment. The attenuation of ambient light in an underwater environment is not only dependent upon the distance travelled and density of particles in the path of the light but also depends on the color/wavelength of the light. For instance, the light intensity of a red channel rapidly decreases whereas the intensity of blue or green channel decreases slowly. Hence, the ambient light component in the images captured in underwater environments is not the true ambient light, which affects the recovery of scene radiance from the entional atmospheric scattering model as shown in (4). Therefore, in order to solve this problem, we propose a novel balance scene recovery model as shown below J(u, v) = I(u, v) Â + A b, (5) max(ˆt(u, v), t ) }{{}}{{} balanced ambient light direct scene radiance where ˆt(u, v) is the estimated transmission value, Â is the estimated ambient light using our CNN model and A b is the balanced ambient light. The balanced ambient light A b is defined as A b = Â a b, (6) where a b is a fixed vector [1/ 3, 1/ 3, 1/ 3] which represent the balanced ambient light in RGB space. Here we assume that the balanced ambient light has same magnitude ( Â ) for all three color channels. The proposed dehazing process with the image reconstructed using the balanced scene radiance recovery model is shown in Fig. 3.

4 (a) He [4]. (b) Zhu [5]. (c) Cai [14]. (d) Proposed (e) He [4]. (f) Zhu [5]. (g) Cai [14]. (h) Proposed. Fig. 5. The error statistics according to saturation value of a ambient light on 15K synthetic patches. The red line s the value of the estimation error and gray boundary s the s. First rows represent the results in balanced ambient light and second rows shows the result in biased ambient light. IV. RESUTS We trained the proposed architecture from about 1 million synthetic hazed patches generated from two publically available datasets (IC-NUIM [19] and SUN database [2]). A mixture of patches from two datasets were used to capture both indoor (IC-NUIM) and outdoor (SUN database) scenes. We used the open source Caffe framework [21] to train our olutional network. We performed several experiments to verify the robustness of the proposed olutional architecture. We also compared the proposed method with several state-of-the-art methods of dehazing available in literature [4] [6], [14]. These algorithms can be broadly classified into (i) entional computer vision techniques that uses prior information [4] [6] and (ii) CNN- (a) Balanced ambient light patches (b) Biased ambient light patches Fig. 4. Two types of synthetic hazy patches. (a) Balanced ambient light patches for light condition without color cast (no bias). (b) Biased ambient light patches for color casted light condition. TABE I TRANSMISSION MAP ACCURACY MSE ( 1 2 ) DCP [4] CAP [5] DehazeNet [14] Ours No color cast With color cast based dehazing methods as in DehazeNet [14] and proposed method. A. Transmission Map Estimation In the haze removal process, the transmission estimation accuracy is the most dominant factor for the dehazing performance. In the atmospheric scattering model (1), the transmission describes a light portion that reaches the camera. When the light is scattered and the transmission attenuated, haze occurs in an image. For underwater environments this attenuation occurs under significantly biased ambient light, which produces color saturation in the hazy region. Therefore, transmission map estimation accuracy has significant effect in original scene radiance estimation. We compare the accuracy of estimated transmission (computed from various methods) for 15K sample patches (exclusively selected from training sets) under two different haze conditions. One haze model is without color cast (i.e., no bias in ambient light as in aerial images) and the other is with color cast in the ambient light (i.e., strong bias in ambient light as in underwater images). We synthetically generated two hazy image sets (Fig. 4), one with balanced ambient light and the other with biased ambient light. Transmission map accuracy from different methods is com-

(a) Original Image. (b) Carlevaris-Bianco [6]. (c) He [4]. (d) Zhu [5]. (e) Cai [14]. (f) Proposed. Fig. 6. The comparison of the estimated transmission map on real underwater image.

The others are unsuccessful when estimating transmission in underwater due to high saturation region of the color as shown in (d) and (e). pared in Table I and Fig. 5.

Table I compares the MSE between estimated transmission and ground truth under two different ambient light conditions. Fig. 5 presents error statistics over 15K test sample patches.

Performance of DehazeNet [14] and CAP [5] depend on the ambient light condition and are competent under balanced ambient light data but fail under biased ambient light.

We think that this is mainly because the bias in the ambient light does not affect the DCP values of the local patches.

5 (a) Original Image. (b) Carlevaris-Bianco [6]. (c) He [4]. (d) Zhu [5]. (e) Cai [14]. (f) Proposed. Fig. 6. The comparison of the estimated transmission map on real underwater image. An original hazy image is shown in (a). Some results show a promising performance as in (b), (c) and (f). The others are unsuccessful when estimating transmission in underwater due to high saturation region of the color as shown in (d) and (e). pared in Table I and Fig. 5. It should be noted that biased ambient light in underwater disrupts the transmission estimation for some methods because they only depend on balanced ambient light conditions. Table I compares the MSE between estimated transmission and ground truth under two different ambient light conditions. Fig. 5 presents error statistics over 15K test sample patches. Note that the proposed method shows the best accuracy in transmission map estimation. Performance of DehazeNet [14] and CAP [5] depend on the ambient light condition and are competent under balanced ambient light data but fail under biased ambient light. We also observed that DCP [4] presents consistent performance regardless of color cast. We think that this is mainly because the bias in the ambient light does not affect the DCP values of the local patches. The proposed method outperforms DCP under balanced ambient light and still robustly estimates the transmission under biased ambient light. A summarizing illustration of transmission map estimation from different methods is shown in Fig. 6. Feasible transmission map estimations are reported in He [4], Carlevaris- Bianco [6] and ours, while the others show insufficient performance due to high saturation of color in the water. As these methods heavily depend on RGB values, additional color correction is required for performances improvement (e.g, white balancing [22] and lαβ color correction [23]). B. Real Underwater Images Dehazing We applied a trained algorithm on a set of real underwater images with different levels of haze. A typical sample of underwater images with various color casts were used as shown in Fig. 7. The six test images have various ambient light conditions. Fig. 7 shows the dehazing results and the estimated ambient light for each method. Carlevaris-Bianco [6] represent good performance for dehazing and color balance among previously reported methods. This is because it uses the unique prior associated with the color-dependent attenuation of light specifically in an underwater. He [4] and Zhu [5] enhanced the contrast of the dehazed images. However, as can be seen in the ambient light estimation row, the estimated ambient light is not accurate as they merely compute it from the estimated transmission map. Cai [14] particularly fails the recovery of scene radiance on color casted underwater images because the algorithm assumes balanced ambient light. Overall, the proposed method shows reliable performance of the proposed dehazing network for underwater images. Note that in this experiment both transmission map and ambient light were estimated from proposed common olutional

6 (a) Original images. (b)carlevaris-bianco [6] (c) He [4]. (d) Zhu [5]. (e) Cai [14]. (f) Proposed results. Fig. 7. Comparison results of dehazing with other methods under various ambient light condition. A small color box represent the estimated ambient light in each method and the images below the color box show dehazing results in each method. Note that the best performance is shows in (b) and (f) regardless of ambient light condition. architecture. These results show good dehazing performance for underwater environments. V. C ONCUSION In this paper, we presented a CNN-based ambient light and transmission estimation framework with common olutional architecture for single image haze removal. We evaluated the performance of the proposed method with syn- thetic data and compared it with existing methods. We also evaluated the qualitative performance of the proposed method on some real underwater images. The preliminary results show promising performance of the dehazing ability of the proposed method.

7 ACKNOWEDGMENT This work is supported through a grant from the KAIST via High Rish High Return Project (Award #N111685) and NRF (Award #N115984), and Ministry of and Infrastructure and Transport s U-city program. REFERENCES [1] Y. Schechner and N. Karpel, Recovery of underwater visibility and structure by polarization analysis, Oceanic Engineering, IEEE Journal of, July 25. [2] S. G. Narasimhan and S. Nayar, Interactive deweathering of an image using physical models, in IEEE IEEE Workshop on Color and Photometric Methods in Computer Vision, In Conjunction with ICCV, October 23. [3] R. Fattal, Single image dehazing, ACM Transaction on Graphics (TOG), vol. 27, no. 3, pp. 72:1 72:9, Aug. 28. [4] K. He, J. Sun, and X. Tang, Single image haze removal using dark channel prior, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp , Dec 211. [5] Q. Zhu, J. Mai, and. Shao, A fast single image haze removal algorithm using color attenuation prior, IEEE Transactions on Image Processing, vol. 24, no. 11, pp , Nov 215. [6] N. Carlevaris-Bianco, A. Mohan, and R. M. Eustice, Initial results in underwater single image dehazing, in Proceedings of the IEEE/MTS OCEANS Conference and Exhibition, Sept 21, pp [7] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, arxiv preprint arxiv: , 215. [8] T. Naseer,. Spinello, W. Burgard, and C. Stachniss, Robust visual robot localization across seasons using network flows, in Proceedings of the National Conference on Artificial Intelligence (AAAI), 214. [9] R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 214. [1] J. Kim, J. K. ee, and K. M. ee, Accurate image super-resolution using very deep olutional networks, arxiv preprint arxiv: , 215. [11] J. Sun, W. Cao, Z. Xu, and J. Ponce, earning a olutional neural network for non-uniform motion blur removal, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 215. [12]. Xu, J. S. Ren, C. iu, and J. Jia, Deep olutional neural network for image deolution, in Advances in Neural Information Processing Systems, 214, pp [13] J. Mai, Q. Zhu, D. Wu, Y. Xie, and. Wang, Back propagation neural network dehazing, in Proc. IEEE Conf. Robotics and Biomimetics, Dec 214, pp [14] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, Dehazenet: An end-to-end system for single image haze removal, arxiv preprint arxiv: , 216. [15] K. Tang, J. Yang, and J. Wang, Investigating haze-relevant features in a learning framework for image dehazing, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 214, pp [16] C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert, Enhancing underwater images and videos by fusion, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 212, pp [17] Y. iu, S. iu, and Z. Wang, A general framework for image fusion based on multi-scale transform and sparse representation, Information Fusion, vol. 24, pp , 215. [18] I. J. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio, Maxout networks, in Proceedings of the 3th International Conference on Machine earning, ICM 213, Atlanta, GA, USA, June 213, 213, pp [19] A. Handa, T. Whelan, J. McDonald, and A. Davison, A benchmark for RGB-D visual odometry, 3D reconstruction and SAM, in Proceedings of the IEEE International Conference on Robotics and Automation, Hong Kong, China, May 214, pp [2] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba, Sun database: arge-scale scene recognition from abbey to zoo, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 21, pp [21] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. ong, R. Girshick, S. Guadarrama, and T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in Proc. ACM Conf. on Multimedia. ACM, 214, pp [22] Y.-C. iu, W.-H. Chan, and Y.-Q. Chen, Automatic white balance for digital still camera, IEEE Transactions on Consumer Electronics, vol. 41, no. 3, pp , [23] G. Bianco, M. Muzzupappa, F. Bruno, R. Garcia, and. Neumann, a new color correction method for underwater imaging, The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 4, no. 5, p. 25, 215.

1. Introduction. Volume 6 Issue 5, May Licensed Under Creative Commons Attribution CC BY. Shahenaz I. Shaikh 1, B. S.

1. Introduction. Volume 6 Issue 5, May Licensed Under Creative Commons Attribution CC BY. Shahenaz I. Shaikh 1, B. S. A Fast Single Image Haze Removal Algorithm Using Color Attenuation Prior and Pixel Minimum Channel Shahenaz I. Shaikh 1, B. S. Kapre 2 1 Department of Computer Science and Engineering, Mahatma Gandhi Mission