arxiv: v1 [cs.cv] 21 Feb 2018

Size: px
Start display at page:

Download "arxiv: v1 [cs.cv] 21 Feb 2018"

Transcription

1 Density-aware Single Image De-raining using a Multi-stream Dense Network He Zhang Vishal M. Patel Department of Electrical and Computer Engineering Rutgers University, Piscataway, NJ {he.zhang92,vishal.m.patel}@rutgers.edu arxiv: v1 [cs.cv] 21 Feb 2018 Abstract Single image rain streak removal is an extremely challenging problem due to the presence of non-uniform rain densities in images. We present a novel densityaware multi-stream densely connected convolutional neural network-based algorithm, called DID-MDN, for joint rain density estimation and de-raining. The proposed method enables the network itself to automatically determine the rain-density information and then efficiently remove the corresponding rain-streaks guided by the estimated raindensity label. To better characterize rain-streaks with different scales and shapes, a multi-stream densely connected de-raining network is proposed which efficiently leverages features from different scales. Furthermore, a new dataset containing images with rain-density labels is created and used to train the proposed density-aware network. Extensive experiments on synthetic and real datasets demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art methods. In addition, an ablation study is performed to demonstrate the improvements obtained by different modules in the proposed method. Code can be found at: 1. Introduction In many applications such as drone-based video surveillance and self driving cars, one has to process images and videos containing undesirable artifacts such as rain, snow, and fog. Furthermore, the performance of many computer vision systems often degrades when they are presented with images containing some of these artifacts. Hence, it is important to develop algorithms that can automatically remove these artifacts. In this paper, we address the problem of rain streak removal from a single image. Various methods have been proposed in the literature to address this problem [17, 6, 35, 19, 2, 14, 9, 1, 36, 33, 5]. One of the main limitations of the existing single image de-raining methods is that they are designed to deal with certain types of rainy images and they do not effec- (a) (b) (c) (d) (e) (f) Figure 1: Image de-raining results. (a) Input rainy image. (b) Result from Fu et al. [6]. (c) DID-MDN. (d) Input rainy image. (e) Result from Li et al. [33]. (f) DID-MDN. Note that [6] tends to over de-rain the image while [33] tends to under de-rain the image. tively consider various shapes, scales and density of rain drops into their algorithms. State-of-the-art de-raining algorithms such as [33, 6] often tend to over de-rain or under de-rain the image if the rain condition present in the test image is not properly considered during training. For example, when a rainy image shown in Fig. 1(a) is de-rained using the method of Fu et al. [6], it tends to remove some important parts in the de-rained image such as the right arm of the person, as shown in Fig. 1(b). Similarly, when [33] is used to de-rain the image shown in Fig. 1(d), it tends to under de-rain the image and leaves some rain streaks in the output de-rained image. Hence, more adaptive and efficient methods, that can deal with different rain density levels present in the image, are needed. One possible solution to this problem is to build a very large training dataset with sufficient rain conditions containing various rain-density levels with different orientations and scales. This has been achieved by Fu et al. [6] and Yang et al.[33], where they synthesize a novel large-scale dataset consisting of rainy images with various conditions 1

2 and they train a single network based on this dataset for image de-raining. However, one drawback of this approach is that a single network may not be capable enough to learn all types of variations present in the training samples. It can be observed from Fig. 1 that both methods tend to either over de-rain or under de-rain results. Alternative solution to this problem is to learn a density-specific model for deraining. However, this solution lacks flexibility in practical de-raining as the density label information is needed for a given rainy image to determine which network to choose for de-raining. In order to address these issues, we propose a novel Density-aware Image De-raining method using a Multistream Dense Network (DID-MDN) that can automatically determine the rain-density information (i.e. heavy, medium or light) present in the input image (see Fig. 2). The proposed method consists of two main stages: rain-density classification and rain streak removal. To accurately estimate the rain-density level, a new residual-aware classifier that makes use of the residual component in the rainy image for density classification is proposed in this paper. The rain streak removal algorithm is based on a multi-stream densely-connected network that takes into account the distinct scale and shape information of rain streaks. Once the rain-density level is estimated, we fuse the estimated density information into our final multi-stream denselyconnected network to get the final de-rained output. Furthermore, to efficiently train the proposed network, a largescale dataset consisting of 12,000 images with different rain-density levels/labels (i.e. heavy, medium and light) is synthesized. Fig. 1(c) & (d) present sample results from our network, where one can clearly see that DID-MDN does not over de-rain or under de-rain the image and is able to provide better results as compared to [6] and [33]. This paper makes the following contributions: 1. A novel DID-MDN method which automatically determines the rain-density information and then efficiently removes the corresponding rain-streaks guided by the estimated rain-density label is proposed. 2. Based on the observation that residual can be used as a better feature representation in characterizing the raindensity information, a novel residual-aware classifier to efficiently determine the density-level of a given rainy image is proposed in this paper. 3. A new synthetic dataset consisting of 12,000 training images with rain-density labels and 1,200 test images is synthesized. To the best of our knowledge, this is the first dataset that contains the rain-density label information. Although the network is trained on our synthetic dataset, it generalizes well to real-world rainy images. 4. Extensive experiments are conducted on three highly challenging datasets (two synthetic and one realworld) and comparisons are performed against several recent state-of-the-art approaches. Furthermore, an ablation study is conducted to demonstrate the effects of different modules in the proposed network. 2. Background and Related Work In this section, we briefly review several recent related works on single image de-raining and multi-scale feature aggregation Single Image De-raining Mathematically, a rainy image y can be modeled as a linear combination of a rain-streak component r with a clean background image x, as follows y = x + r. (1) In single image de-raining, given y the goal is to recover x. As can be observed from (1) that image de-raining is a highly ill-posed problem. Unlike video-based methods [23, 29, 25], which leverage temporal information in removing rain components, prior-based methods have been proposed in the literature to deal with this problem. These include sparse coding-based methods [14, 9, 41], lowrank representation-based methods [2, 35] and GMM-based (gaussian mixture model) methods [17]. One of the limitations of some of these prior-based methods is that they often tend to over-smooth the image details [14, 35]. Recently, due to the immense success of deep learning in both high-level and low-level vision tasks [8, 31, 38, 21, 32, 34], several CNN-based methods have also been proposed for image de-raining [3, 5, 33, 6]. In these methods, the idea is to learn a mapping between input rainy images and their corresponding ground truths using a CNN structure Multi-scale Feature Aggregation It has been observed that combining convolutional features at different levels (scales) can lead to a better representation of an object in the image and its surrounding context [7, 39, 8, 11]. For instance, to efficiently leverage features obtained from different scales, the FCN (fully convolutional network) method [18] uses skip-connections and adds high-level prediction layers to intermediate layers to generate pixel-wise prediction results at multiple resolutions. Similarly, the U-Net architecture [24] consists of a contracting path to capture the context and a symmetric expanding path that enables the precise localization. The HED model [30] employs deeply supervised structures, and automatically learns rich hierarchical representations that are fused to resolve the challenging ambiguity in edge and object boundary detection. Multi-scale features have also been leveraged in various applications such as semantic segmentation [39], face-alignment [20], visual tracking [16] 2

3 Figure 2: An overview of the proposed DID-MDN method. The proposed network contains two modules: (a) residual-aware rain-density classifier, and (b) multi-stream densely-connected de-raining network. The goal of the residual-aware rain-density classifier is to determine the rain-density level given a rainy image. On the other hand, the multi-stream densely-connected de-raining network is designed to efficiently remove the rain streaks from the rainy images guided by the estimated rain-density information. crowd-counting [27], action recognition [42], depth estimation [4], single image dehazing [22, 37] and also in single image de-raining [33]. Similar to [33], we also leverage a multi-stream network to capture the rain-streak components with different scales and shapes. However, rather than using two convolutional layers with different dilation factors to combine features from different scales, we leverage the densely-connected block [11] as the building module and then we connect features from each block together for the final rain-streak removal. The ablation study demonstrates the effectiveness of our proposed network compared with the structure proposed in [33]. 3. Proposed Method The proposed DID-MDN architecture mainly consists of two modules: (a) residual-aware rain-density classifier, and (b) multi-stream densely connected de-raining network. The residual-aware rain-density classifier aims to determine the rain-density level given a rainy image. On the other hand, the multi-stream densely connected de-raining network is designed to efficiently remove the rain streaks from the rainy images guided by the estimated rain-density information. The entire network architecture of the proposed DID-MDN method is shown in Fig Residual-aware Rain-density Classifier As discussed above, even though some of the previous methods achieve significant improvements on the deraining performance, they often tend to over de-rain or under de-rain the image. This is mainly due to the fact that a single network may not be sufficient enough to learn different rain-densities occurring in practice. We believe that incorporating density level information into the network can benefit the overall learning procedure and hence can guarantee better generalization to different rain conditions [23]. Similar observations have also been made in [23], where they use two different priors to characterize light rain and heavy rain, respectively. Unlike using two priors to characterize different rain-density conditions [23], the rain-density label estimated from a CNN classifier is used for guiding the de-raining process. To accurately estimate the density information given a rainy input image, a residual-aware raindensity classifier is proposed, where the residual information is leveraged to better represent the rain features. In addition, to train the classier, a large-scale synthetic dataset consisting of 12,000 rainy images with density labels is synthesized. Note that there are only three types of classes (i.e. labels) present in the dataset and they correspond to low, medium and high density. One common strategy in training a new classifier is to fine-tune a pre-defined model such as VGG-16 [26], Res-net [8] or Dense-net [11] on the newly introduced dataset. One of the fundamental reasons to leverage a fine-tune strategy for the new dataset is that discriminative features encoded in these pre-defined models can be beneficial in accelerating the training and it can also guarantee better generalization. However, we observed that directly fine-tuning such a deep model on our task is not an efficient solution. This is 3

4 mainly due to the fact that high-level features (deeper part) of a CNN tend to pay more attention to localize the discriminative objects in the input image [40]. Hence, relatively small rain-streaks may not be localized well in these highlevel features. In other words, the rain-streak information may be lost in the high-level features and hence may degrade the overall classification performance. As a result, it is important to come up with a better feature representation to effectively characterize rain-streaks (i.e. rain-density). From (1), one can regard r = y x as the residual component which can be used to characterize the rain-density. To estimate the residual component (ˆr) from the observation y, a multi-stream dense-net (without the label fusion part) using the new dataset with heavy-density is trained. Then, the estimated residual is regarded as the input to train the final classifier. In this way, the residual estimation part can be regarded as the feature extraction procedure 1, which is discussed in Section 3.2. The classification part is mainly composed of three convolutional layers (Conv) with kernel size 3 3, one average pooling (AP) layer with kernel size 9 9 and two fully-connected layers (FC). Details of the classifier are as follows: Conv(3,24)-Conv(24,64)-Conv(64,24)-AP- FC(127896,512)-FC(512,3), where (3,24) means that the input consists of 3 channels and the output consists of 24 channels. Note that the final layer consists of a set of 3 neurons indicating the rain-density class of the input image (i.e. low, medium, high). An ablation study, discussed in Section 4.3, is conducted to demonstrate the effectiveness of proposed residual-aware classifier as compared with the VGG-16 [26] model. Loss for the Residual-aware Classifier:. To efficiently train the classifier, a two-stage training protocol is leveraged. A residual feature extraction network is firstly trained to estimate the residual part of the given rainy image, then a classification sub-network is trained using the estimated residual as the input and is optimized via the ground truth labels (rain-density). Finally, the two stages (feature extraction and classification) are jointly optimized. The overall loss function used to train the residual-aware classier is as follows: L = L E,r + L C, (2) where L E,r indicates the per-pixel Euclidean-loss to estimate the residual component and L C indicates the crossentropy loss for rain-density classification Multi-stream Dense Network It is well-known that different rainy images contain rainstreaks with different scales and shapes. Considering the 1 Classificaiton network can be regarded as two parts: 1.Feature extractor and 2. Classifer (a) (b) Figure 3: Sample images containing rain-streaks with various scales and shapes.(a) contains smaller rain-streaks, (b) contains longer rain-streaks. images shown in Fig. 3, the rainy image in Fig. 3 (a) contains smaller rain-streaks, which can be captured by smallscale features (with smaller receptive fields), while the image in Fig. 3 (b) contains longer rain-streaks, which can be captured by large-scale features (with larger receptive fields). Hence, we believe that combining features from different scales can be a more efficient way to capture various rain streak components [10, 33]. Based on this observation and motivated by the success of using multi-scale features for single image de-raining [33], a more efficient multi-stream densely-connected network to estimate the rain-streak components is proposed, where each stream is built on the dense-block introduced in [11] with different kernel sizes (different receptive fields). These multi-stream blocks are denoted by Dense1 (7 7), Dense2 (5 5), and Dense3 (3 3), in yellow, green and blue blocks, respectively in Fig. 2. In addition, to further improve the information flow among different blocks and to leverage features from each dense-block in estimating the rain streak components, a modified connectivity is introduced, where all the features from each block are concatenated together for rain-streak estimation. Rather than leveraging only two convolutional layers in each stream [33], we create short paths among features from different scales to strengthen feature aggregation and to obtain better convergence. To demonstrate the effectiveness of our proposed multi-stream network compared with the multi-scale structure proposed in [33], an ablation study is conducted, which is described in Section 4. To leverage the rain-density information to guide the deraining process, the up-sampled label map 2 is concatenated with the rain streak features from all three streams. Then, the concatenated features are used to estimate the residual (ˆr) rain-streak information. In addition, the residual is subtracted from the input rainy image to estimate the coarse de-rained image. Finally, to further refine the estimated 2 For example, if the label is 1, then the corresponding up-sampled label-map is of the same dimension as the output features from each stream and all the pixel values of the label map are 1. 4

5 coarse de-rained image and make sure better details well preserved, another two convolutional layers with ReLU are adopted as the final refinement. There are six dense-blocks in each stream. Mathematically, each stream can be represented as s j = cat[db 1, DB 2,..., DB 6 ], (3) where cat indicates concatenation, DB i, i = 1, 6 denotes the output from the ith dense block, and s j, j = 1, 2, 3 denotes the jth stream. Furthermore, we adopt different transition layer combinations 3 and kernel sizes in each stream. Details of each stream are as follows: Dense1: three transition-down layers, three transition-up layers and kernel size 7 7. Dense2: two transition-down layers, two no-sampling transition layers, two transition-up layers and kernel size 5 5. Dense3: one transition-down layer, four no-sampling transition layers, one transition-up layer and kernel size 3 3. Note that each dense-block is followed by a transition layer. Fig 4 presents an overview of the first stream, Dense Testing During testing, the rain-density label information using the proposed residual-aware classifier is estimated. Then, the up-sampled label-map with the corresponding input image are fed into the multi-stream network to get the final de-rained image. 4. Experimental Results In this section, we present the experimental details and evaluation results on both synthetic and real-world datasets. De-raining performance on the synthetic data is evaluated in terms of PSNR and SSIM [28]. Performance of different methods on real-world images is evaluated visually since the ground truth images are not available. The proposed DID-MDN method is compared with the following recent state-of-the-art methods: (a) Discriminative sparse codingbased method (DSC) [19] (ICCV 15), (b) Gaussian mixture model (GMM) based method [17] (CVPR 16), (c) CNN method (CNN) [5] (TIP 17), (d) Joint Rain Detection and Removal (JORDER) method [33] (CVPR 17), (e) Deep detailed Network method (DDN) [6] (CVPR 17), and (f) Joint Bi-layer Optimization (JBO) method [41] (ICCV 17) Synthetic Dataset Figure 4: Details of the first stream Dense1. Loss for the De-raining Network:. Motivated by the observation that CNN feature-based loss can better improve the semantic edge information [13, 15] and to further enhance the visual quality of the estimated de-rained image [36], we also leverage a weighted combination of pixelwise Euclidean loss and the feature-based loss. The loss for training the multi-stream densely connected network is as follows L = L E,r + L E,d + λ F L F, (4) where L E,d represents the per-pixel Euclidean loss function to reconstruct the de-rained image and L F is the featurebased loss for the de-rained image, defined as L F = 1 CW H F (ˆx)c,w,h F (x) c,w,h 2 2, (5) where F represents a non-linear CNN transformation and ˆx is the recovered de-rained image. Here, we have assumed that the features are of size w h with c channels. In our method, we compute the feature loss from the layer relu1 2 of the VGG-16 model [26]. 3 The transition layer can function as up-sample transition, downsample transition or no-sampling transition [12]. Even though there exist several large-scale synthetic datasets [6, 36, 33], they lack the availability of the corresponding rain-density label information for each synthetic rainy image. Hence, we develop a new dataset, denoted as Train1, consisting of 12,000 images, where each image is assigned a label based on its corresponding rain-density level. There are three rain-density labels present in the dataset (e.g. light, medium and heavy). There are roughly 4,000 images per rain-density level in the dataset. Similarly, we also synthesize a new test set, denoted as Test1, which consists of a total of 1,200 images. It is ensured that each dataset contains rain streaks with different orientations and scales. Images are synthesized using Photoshop. We modify the noise level introduced in step 3 of 4 to generate different rain-density images, where light, medium and heavy rain conditions correspond to the noise levels 5% 35%, 35% 65%, and 65% 95%, respectively 5. Sample synthesized images under these three conditions are shown in Fig 5. To better test the generalization capability of the proposed method, we also randomly sample 1,000 images from the synthetic dataset provided by Fu [6] as another testing set, denoted as Test The reason why we use three labels is that during our experiments, we found that having more than three rain-density levels does not significantly improve the performance. Hence, we only use three labels (heavy, medium and light) in the experiments. 5

6 Table 1: Quantitative results evaluated in terms of average SSIM and PSNR (db) (SSIM/PSNR). Input DSC [19] (ICCV 15) GMM [17] (CVPR 16) CNN [5] (TIP 17) JORDER [33] (CVPR 17) DDN [6] (CVPR 17) JBO [41] (ICCV 17) DID-MDN Test / / / / / / / / Test / / / / / / / / Heavy Medium Light Figure 5: Samples synthetic images in three different conditions. Table 2: Quantitative results compared with three baseline configurations on Test1. Single Yang-Multi [33] Multi-no-label DID-MDN PSNR (db) SSIM Table 3: Accuracy of rain-density estimation evaluated on Test Training Details VGG-16 [26] Residual-aware Accuracy % % During training, a image is randomly cropped from the input image (or its horizontal flip) of size Adam is used as optimization algorithm with a mini-batch size of 1. The learning rate starts from and is divided by 10 after 20 epoch. The models are trained for up to iterations. We use a weight decay of and a momentum of 0.9. The entire network is trained using the Pytorch framework. During training, we set λ F = 1. All the parameters are defined via crossvalidation using the validation set Ablation Study The first ablation study is conducted to demonstrate the effectiveness of the proposed residual-aware classifier compared to the VGG-16 [26] model. The two classifiers are trained using our synthesized training samples Train1 and tested on the Test1 set. The classification accuracy corresponding to both classifiers on Test1 is tabulated in Table 3. It can be observed that the proposed residual-aware classifier is more accurate than the VGG-16 model for predicting the rain-density levels. In the second ablation study, we demonstrate the effectiveness of different modules in our method by conducting the following experiments: Single: A single-stream densely connected network (Dense2) without the procedure of label fusion. Yang-Multi [33] 6 : Multi-stream network trained without the procedure of label fusion. Multi-no-label: Multi-stream densely connected network trained without the procedure of label fusion. DID-MDN (our): Multi-stream Densely-connected network trained with the procedure of estimated label fusion. The average PSNR and SSIM results evaluated on Test1 are tabulated in Table 2. As shown in Fig. 6, even though the single stream network and Yang s multi-stream network [33] are able to successfully remove the rain streak components, they both tend to over de-rain the image with the blurry output. The multi-stream network without label fusion is unable to accurately estimate the rain-density level and hence it tends to leave some rain streaks in the derained image (especially observed from the derained-part around the light). In contrast, the proposed multi-stream network with label fusion approach is capable of removing rain streaks while preserving the background details. Similar observations can be made using the quantitative results as shown in Table Results on Two Synthetic Datasets We compare quantitative and qualitative performance of different methods on the test images from the two synthetic datasets - Test1 and Test2. Quantitative results corresponding to different methods are tabulated in Table 1. It can be clearly observed that the proposed DID-MDN is able to achieve superior quantitative performance. To visually demonstrate the improvements obtained by the proposed method on the synthetic dataset, results on two sample images selected from Test2 and one sample chosen from our newly synthesized Test1 are presented in Figure 7. Note that we selectively sample images from all three conditions to show that our method performs well under different variations 7. While the JORDER method [33] is able to remove some parts of the rain-streaks, it still tends to leave some rain-streaks in the de-rained images. Similar results are also observed from [41]. Even though the method 6 To better demonstrate the effectiveness of our proposed muli-stream network compared with the state-of-the-art multi-scale structure proposed in [33], we replace our multi-stream dense-net part with the multi-scale structured in [33] and keep all the other parts the same. 7 Due to space limitations and for better comparisons, we only show the results corresponding to the most recent state-of-the-art methods [33, 6, 41] in the main paper. More results corresponding to the other methods [19, 17, 5] can be found in Supplementary Material. 6

7 PSNR: SSIM: 0.51 PSNR: SSIM: Input Single PSNR: SSIM: PSNR: SSIM: Yang-Multi [33] PSNR: SSIM: Multi-no-label PSNR: Inf SSIM: 1 DID-MDN Ground Truth Figure 6: Results of ablation study on a synthetic image. PSNR: SSIM: PSNR:21.89 SSIM: PSNR: SSIM: PSNR: SSIM: PSNR: SSIM: PSNR: Inf SSIM: 1 PSNR:19.31 SSIM: PSNR:22.28 SSIM: PSNR:26.88 SSIM: PSNR: SSIM: PSNR: SSIM: PSNR: Inf SSIM:1 PSNR: SSIM: PSNR:24.20 SSIM: PSNR:29.44 SSIM: PSNR:25.32 SSIM: PSNR:29.84 SSIM: PSNR: Inf SSIM:1 Input JORDER (CVPR 17) [33] DDN (CVPR 17) [6] JBO (ICCV 17) [41] DID-MDN Ground Truth Figure 7: Rain-streak removal results on sample images from the synthetic datasets Test1 and Test2. of Fu et al. [6] is able to remove the rain-streak, especially in the medium and light rain conditions, it tends to remove some important details as well, such as flower details, as shown in the second row and window structures as shown in the third row (Details can be better observed via zooming-in the figure). Overall, the proposed method is able to preserve better details while effectively removing the rain-streak components As before, previous methods either tend to under de-rain or over de-rain the images. In contrast, the proposed method achieves better results in terms of effectively removing rain streaks while preserving the image details. In addition, it can be observed that the proposed method is able to deal with different types of rain conditions, such as heavy rain shown in the second row of Fig 8 and medium rain shown in the fifth row of Fig 8. Furthermore, the proposed method can effectively deal with rain-streaks containing different shapes and scales such as small round rain streaks shown in the third row in Fig 8 and long-thin rain-streak in the second row in Fig 8. Overall, the results evaluated on real-world images captured from different rain conditions demonstrate the effectiveness and the robustness of the proposed DID- Results on Real-World Images The performance of the proposed method is also evaluated on many real-world images downloaded from the Internet and also real-world images published by the authors of [36, 6]. The de-raining results are shown in Fig 8. 7

8 Input JORDER (CVPR 17) [33] DDN (CVPR 17) [6] JBO (ICCV 17) [41] DID-MDN Figure 8: Rain-streak removal results on sample real-world images. MDN method. More results can be found in Supplementary Material work (DID-MDN) for jointly rain-density estimation and deraining. In comparison to existing approaches which attempt to solve the de-raining problem using a single network to learn to remove rain streaks with different densities (heavy, medium and light), we investigated the use of estimated rain-density label for guiding the synthesis of the derained image. To efficiently predict the rain-density label, a residual-aware rain-density classier is proposed in this paper. Detailed experiments and comparisons are performed on two synthetic and one real-world datasets to demonstrate that the proposed DID-MDN method significantly outperforms many recent state-of-the-art methods. Additionally, the proposed DID-MDN method is compared against baseline configurations to illustrate the performance gains obtained by each module. Running Time Comparisons Running time comparisons are shown in the table below. It can be observed that the testing time of the proposed DIDMDN is comparable to the DDN [6] method. On average, it takes about 0.3s to de-rain an image of size Table 4: Running time (in seconds) for different methods averaged on 1000 images with size X512 DSC GMM CNN (GPU) JORDER (GPU) DDN (GPU) JBO (CPU) DID-MDN (GPU) 189.3s 674.8s 2.8s 600.6s 0.3s 1.4s 0.2s 5. Conclusion References In this paper, we propose a novel density-aware image deraining method with multi-stream densely connected net- [1] D.-Y. Chen, C.-C. Chen, and L.-W. Kang. Visual depth guided color image rain streaks removal using sparse coding. 8

9 IEEE transactions on circuits and systems for video technology, 24(8): , [2] Y.-L. Chen and C.-T. Hsu. A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In IEEE ICCV, pages , [3] D. Eigen, D. Krishnan, and R. Fergus. Restoring an image taken through a window covered with dirt or rain. In ICCV, pages , [4] D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In NIPS, pages , [5] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley. Clearing the skies: A deep network architecture for single-image rain removal. IEEE Transactions on Image Processing, 26(6): , [6] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley. Removing rain from single images via a deep detail network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages , July [7] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In European Conference on Computer Vision, pages Springer, [8] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [9] D.-A. Huang, L.-W. Kang, Y.-C. F. Wang, and C.-W. Lin. Self-learning based image decomposition with applications to single image denoising. IEEE Transactions on multimedia, 16(1):83 93, [10] D.-A. Huang, L.-W. Kang, M.-C. Yang, C.-W. Lin, and Y.-C. F. Wang. Context-aware single image rain removal. In Multimedia and Expo (ICME), 2012 IEEE International Conference on, pages IEEE, [11] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten. Densely connected convolutional networks. arxiv preprint arxiv: , [12] S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pages IEEE, [13] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision, pages Springer, [14] L.-W. Kang, C.-W. Lin, and Y.-H. Fu. Automatic singleimage-based rain streaks removal via image decomposition. IEEE TIP, 21(4): , [15] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1 8, [16] K. Li, Y. Kong, and Y. Fu. Multi-stream deep similarity learning networks for visual tracking. In IJCAI, [17] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown. Rain streak removal using layer priors. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages , June [18] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [19] Y. Luo, Y. Xu, and H. Ji. Removing rain from a single image via discriminative sparse coding. In ICCV, pages , [20] X. Peng, R. S. Feris, X. Wang, and D. N. Metaxas. A recurrent encoder-decoder network for sequential face alignment. In European Conference on Computer Vision, pages Springer International Publishing, [21] X. Peng, X. Yu, K. Sohn, D. Metaxas, and M. Chandraker. Reconstruction for feature disentanglement in pose-invariant face recognition. In ICCV, [22] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.-H. Yang. Single image dehazing via multi-scale convolutional neural networks. In ECCV, pages Springer, [23] W. Ren, J. Tian, Z. Han, A. Chan, and Y. Tang. Video desnowing and deraining based on matrix decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [24] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages Springer, [25] V. Santhaseelan and V. K. Asari. Utilizing local phase information to remove rain from video. International Journal of Computer Vision, 112(1):71 89, [26] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arxiv preprint arxiv: , [27] V. A. Sindagi and V. M. Patel. Generating high-quality crowd density maps using contextual pyramid cnns. In ICCV, [28] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE TIP, 13(4): , [29] W. Wei, L. Yi, Q. Xie, Q. Zhao, D. Meng, and Z. Xu. Should we encode rain streaks in video as deterministic or stochastic? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [30] S. Xie and Z. Tu. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, pages , [31] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In CVPR, [32] J. Xue, H. Zhang, K. Dana, and K. Nishino. Differential angular imaging for material recognition. In CVPR, [33] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan. Deep joint rain detection and removal from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages ,

10 [34] H. Zhang and K. Dana. Multi-style generative network for real-time transfer. arxiv preprint arxiv: , [35] H. Zhang and V. M. Patel. Convolutional sparse and lowrank coding-based rain streak removal. In Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, pages IEEE, [36] H. Zhang, V. Sindagi, and V. M. Patel. Image de-raining using a conditional generative adversarial network. arxiv preprint arxiv: , [37] H. Zhang, V. Sindagi, and V. M. Patel. Joint transmission map estimation and dehazing using deep networks. arxiv preprint arxiv: , [38] Z. Zhang, Y. Xie, F. Xing, M. McGough, and L. Yang. Mdnet: A semantically and visually interpretable medical image diagnosis network. In CVPR, [39] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In Proceedings of the IEEE International Conference on Computer Vision, pages 1 8, [40] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [41] L. Zhu, C.-W. Fu, D. Lischinski, and P.-A. Heng. Joint bilayer optimization for single-image rain streak removal. In Proceedings of the IEEE international conference on computer vision, pages , [42] Y. Zhu, Z. Lan, S. Newsam, and A. G. Hauptmann. Hidden two-stream convolutional networks for action recognition. arxiv preprint arxiv: ,

arxiv: v1 [cs.cv] 21 Nov 2018

arxiv: v1 [cs.cv] 21 Nov 2018 A Deep Tree-Structured Fusion Model for Single Image Deraining Xueyang Fu, Qi Qi, Yue Huang, Xinghao Ding, Feng Wu, John Paisley School of Information Science and Technology, Xiamen University, China School

More information

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Xintao Wang Ke Yu Chao Dong Chen Change Loy Problem enlarge 4 times Low-resolution image High-resolution image Previous

More information

CNN for Low Level Image Processing. Huanjing Yue

CNN for Low Level Image Processing. Huanjing Yue CNN for Low Level Image Processing Huanjing Yue 2017.11 1 Deep Learning for Image Restoration General formulation: min Θ L( x, x) s. t. x = F(y; Θ) Loss function Parameters to be learned Key issues The

More information

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Kihyuk Sohn 1 Sifei Liu 2 Guangyu Zhong 3 Xiang Yu 1 Ming-Hsuan Yang 2 Manmohan Chandraker 1,4 1 NEC Labs

More information

Removing rain from single images via a deep detail network

Removing rain from single images via a deep detail network 207 IEEE Conference on Computer Vision and Pattern Recognition Removing rain from single images via a deep detail network Xueyang Fu Jiabin Huang Delu Zeng 2 Yue Huang Xinghao Ding John Paisley 3 Key Laboratory

More information

Removing rain from single images via a deep detail network

Removing rain from single images via a deep detail network Removing rain from single images via a deep detail network Xueyang Fu 1 Jiabin Huang 1 Delu Zeng 2 Yue Huang 1 Xinghao Ding 1 John Paisley 3 1 Key Laboratory of Underwater Acoustic Communication and Marine

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon 3D Densely Convolutional Networks for Volumetric Segmentation Toan Duc Bui, Jitae Shin, and Taesup Moon School of Electronic and Electrical Engineering, Sungkyunkwan University, Republic of Korea arxiv:1709.03199v2

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

arxiv: v1 [cs.cv] 29 Jul 2018

arxiv: v1 [cs.cv] 29 Jul 2018 arxiv:1807.11078v1 [cs.cv] 29 Jul 2018 Semi-supervised CNN for Single Image Rain Removal Wei Wei, Deyu Meng, Qian Zhao and Zongben Xu School of Mathematics and Statistics, Xi an Jiaotong University weiweiwe@stu.mail.xjtu.edu.cn,

More information

Deep joint rain and haze removal from single images

Deep joint rain and haze removal from single images Deep joint rain and haze removal from single images Liang Shen, Zihan Yue, Quan Chen, Fan Feng and Jie Ma Institute of Image Recognition and Artificial Intelligence School of Automation, Huazhong University

More information

RSRN: Rich Side-output Residual Network for Medial Axis Detection

RSRN: Rich Side-output Residual Network for Medial Axis Detection RSRN: Rich Side-output Residual Network for Medial Axis Detection Chang Liu, Wei Ke, Jianbin Jiao, and Qixiang Ye University of Chinese Academy of Sciences, Beijing, China {liuchang615, kewei11}@mails.ucas.ac.cn,

More information

arxiv: v1 [cs.cv] 5 Jul 2017

arxiv: v1 [cs.cv] 5 Jul 2017 AlignGAN: Learning to Align Cross- Images with Conditional Generative Adversarial Networks Xudong Mao Department of Computer Science City University of Hong Kong xudonmao@gmail.com Qing Li Department of

More information

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Supplementary Material

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Supplementary Material Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Supplementary Material Xintao Wang 1 Ke Yu 1 Chao Dong 2 Chen Change Loy 1 1 CUHK - SenseTime Joint Lab, The Chinese

More information

arxiv: v1 [cs.cv] 22 Mar 2018

arxiv: v1 [cs.cv] 22 Mar 2018 Densely Connected Pyramid Dehazing Network He Zhang Vishal M. Patel Department of Electrical and Computer Engineering Rutgers University, Piscataway, NJ 08854 arxiv:1803.08396v1 [cs.cv] 22 Mar 2018 {he.zhang92,vishal.m.patel}@rutgers.edu

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Efficient Module Based Single Image Super Resolution for Multiple Problems

Efficient Module Based Single Image Super Resolution for Multiple Problems Efficient Module Based Single Image Super Resolution for Multiple Problems Dongwon Park Kwanyoung Kim Se Young Chun School of ECE, Ulsan National Institute of Science and Technology, 44919, Ulsan, South

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

DCGANs for image super-resolution, denoising and debluring

DCGANs for image super-resolution, denoising and debluring DCGANs for image super-resolution, denoising and debluring Qiaojing Yan Stanford University Electrical Engineering qiaojing@stanford.edu Wei Wang Stanford University Electrical Engineering wwang23@stanford.edu

More information

arxiv: v1 [cs.cv] 16 Jul 2018

arxiv: v1 [cs.cv] 16 Jul 2018 arxiv:1807.05698v1 [cs.cv] 16 Jul 2018 Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining Xia Li 123, Jianlong Wu 23, Zhouchen Lin 23, Hong Liu 1( ), and Hongbin Zha 23

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Controllable Generative Adversarial Network

Controllable Generative Adversarial Network Controllable Generative Adversarial Network arxiv:1708.00598v2 [cs.lg] 12 Sep 2017 Minhyeok Lee 1 and Junhee Seok 1 1 School of Electrical Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul,

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation Introduction Supplementary material In the supplementary material, we present additional qualitative results of the proposed AdaDepth

More information

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models [Supplemental Materials] 1. Network Architecture b ref b ref +1 We now describe the architecture of the networks

More information

Progress on Generative Adversarial Networks

Progress on Generative Adversarial Networks Progress on Generative Adversarial Networks Wangmeng Zuo Vision Perception and Cognition Centre Harbin Institute of Technology Content Image generation: problem formulation Three issues about GAN Discriminate

More information

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 Mask R-CNN Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 1 Common computer vision tasks Image Classification: one label is generated for

More information

Image De-raining Using a Conditional Generative Adversarial Network

Image De-raining Using a Conditional Generative Adversarial Network 1 Image De-raining Using a Conditional Generative Adversarial Network He Zhang, Student Member, IEEE, Vishwanath Sindagi, Student Member, IEEE Vishal M. Patel, Senior Member, IEEE arxiv:1701.05957v3 [cs.cv]

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Joint Transmission Map Estimation and Dehazing using Deep Networks

Joint Transmission Map Estimation and Dehazing using Deep Networks 1 Joint Transmission Map Estimation and Dehazing using Deep Networks He Zhang, Student Member, IEEE, Vishwanath A. Sindagi, Student Member, IEEE Vishal M. Patel, Senior Member, IEEE arxiv:1708.00581v1

More information

Hierarchical Recurrent Filtering for Fully Convolutional DenseNets

Hierarchical Recurrent Filtering for Fully Convolutional DenseNets Hierarchical Recurrent Filtering for Fully Convolutional DenseNets Jörg Wagner 1,2, Volker Fischer 1, Michael Herman 1 and Sven Behnke 2 arxiv:1810.02766v1 [cs.cv] 5 Oct 2018 1- Bosch Center for Artificial

More information

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 16 Nov 2015 Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial

More information

Image Super-Resolution Using Dense Skip Connections

Image Super-Resolution Using Dense Skip Connections Image Super-Resolution Using Dense Skip Connections Tong Tong, Gen Li, Xiejie Liu, Qinquan Gao Imperial Vision Technology Fuzhou, China {ttraveltong,ligen,liu.xiejie,gqinquan}@imperial-vision.com Abstract

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1, Wei-Chen Chiu 2, Sheng-De Wang 1, and Yu-Chiang Frank Wang 1 1 Graduate Institute of Electrical Engineering,

More information

Physics-Based Generative Adversarial Models for Image Restoration and Beyond

Physics-Based Generative Adversarial Models for Image Restoration and Beyond 1 Physics-Based Generative Adversarial Models for Image Restoration and Beyond Jinshan Pan, Yang Liu, Jiangxin Dong, Jiawei Zhang, Jimmy Ren, Jinhui Tang, Yu-Wing Tai and Ming-Hsuan Yang arxiv:1808.00605v1

More information

arxiv: v1 [cs.cv] 29 Sep 2016

arxiv: v1 [cs.cv] 29 Sep 2016 arxiv:1609.09545v1 [cs.cv] 29 Sep 2016 Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge Adrian Bulat and Georgios Tzimiropoulos Computer Vision

More information

Cascade Region Regression for Robust Object Detection

Cascade Region Regression for Robust Object Detection Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng, Shaoli Huang, Jing Yang, Hui Shuai, Zhengbo Yu, Zongguang Lu, Qiang Ma, Yali

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

High-Resolution Image Dehazing with respect to Training Losses and Receptive Field Sizes

High-Resolution Image Dehazing with respect to Training Losses and Receptive Field Sizes High-Resolution Image Dehazing with respect to Training osses and Receptive Field Sizes Hyeonjun Sim, Sehwan Ki, Jae-Seok Choi, Soo Ye Kim, Soomin Seo, Saehun Kim, and Munchurl Kim School of EE, Korea

More information

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network De-mark GAN: Removing Dense Watermark With Generative Adversarial Network Jinlin Wu, Hailin Shi, Shu Zhang, Zhen Lei, Yang Yang, Stan Z. Li Center for Biometrics and Security Research & National Laboratory

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,

More information

arxiv: v2 [cs.cv] 14 May 2018

arxiv: v2 [cs.cv] 14 May 2018 ContextVP: Fully Context-Aware Video Prediction Wonmin Byeon 1234, Qin Wang 1, Rupesh Kumar Srivastava 3, and Petros Koumoutsakos 1 arxiv:1710.08518v2 [cs.cv] 14 May 2018 Abstract Video prediction models

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION 2017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 25 28, 2017, TOKYO, JAPAN DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1,

More information

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist

More information

Deep Back-Projection Networks For Super-Resolution Supplementary Material

Deep Back-Projection Networks For Super-Resolution Supplementary Material Deep Back-Projection Networks For Super-Resolution Supplementary Material Muhammad Haris 1, Greg Shakhnarovich 2, and Norimichi Ukita 1, 1 Toyota Technological Institute, Japan 2 Toyota Technological Institute

More information

Video Frame Interpolation Using Recurrent Convolutional Layers

Video Frame Interpolation Using Recurrent Convolutional Layers 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM) Video Frame Interpolation Using Recurrent Convolutional Layers Zhifeng Zhang 1, Li Song 1,2, Rong Xie 2, Li Chen 1 1 Institute of

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

arxiv: v1 [cs.cv] 25 Dec 2017

arxiv: v1 [cs.cv] 25 Dec 2017 Deep Blind Image Inpainting Yang Liu 1, Jinshan Pan 2, Zhixun Su 1 1 School of Mathematical Sciences, Dalian University of Technology 2 School of Computer Science and Engineering, Nanjing University of

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution

More information

arxiv: v1 [cs.cv] 15 May 2018

arxiv: v1 [cs.cv] 15 May 2018 A DEEPLY-RECURSIVE CONVOLUTIONAL NETWORK FOR CROWD COUNTING Xinghao Ding, Zhirui Lin, Fujin He, Yu Wang, Yue Huang Fujian Key Laboratory of Sensing and Computing for Smart City, Xiamen University, China

More information

Xiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng

Xiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng Direction-aware Spatial Context Features for Shadow Detection Xiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng The Chinese University of Hong Kong The Hong Kong Polytechnic University Shenzhen

More information

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Presented by: Rex Ying and Charles Qi Input: A Single RGB Image Estimate

More information

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang, Pavlo Molchanov, Jan Kautz INTELLIGENT VIDEO ANALYTICS Surveillance event detection Human-computer interaction

More information

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Jiahao Pang 1 Wenxiu Sun 1 Chengxi Yang 1 Jimmy Ren 1 Ruichao Xiao 1 Jin Zeng 1 Liang Lin 1,2 1 SenseTime Research

More information

Improving Face Recognition by Exploring Local Features with Visual Attention

Improving Face Recognition by Exploring Local Features with Visual Attention Improving Face Recognition by Exploring Local Features with Visual Attention Yichun Shi and Anil K. Jain Michigan State University Difficulties of Face Recognition Large variations in unconstrained face

More information

Large-scale Video Classification with Convolutional Neural Networks

Large-scale Video Classification with Convolutional Neural Networks Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area

More information

Deep Learning for Visual Manipulation and Synthesis

Deep Learning for Visual Manipulation and Synthesis Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay

More information

Multi-scale Single Image Dehazing using Perceptual Pyramid Deep Network

Multi-scale Single Image Dehazing using Perceptual Pyramid Deep Network Multi-scale Single Image Dehazing using Perceptual Pyramid Deep Network He Zhang Vishwanath Sindagi Vishal M. Patel Department of Electrical and Computer Engineering Rutgers University, Piscataway, NJ

More information

GAN Related Works. CVPR 2018 & Selective Works in ICML and NIPS. Zhifei Zhang

GAN Related Works. CVPR 2018 & Selective Works in ICML and NIPS. Zhifei Zhang GAN Related Works CVPR 2018 & Selective Works in ICML and NIPS Zhifei Zhang Generative Adversarial Networks (GANs) 9/12/2018 2 Generative Adversarial Networks (GANs) Feedforward Backpropagation Real? z

More information

arxiv: v1 [cs.cv] 14 Jul 2017

arxiv: v1 [cs.cv] 14 Jul 2017 Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen Baidu IDL & Tsinghua University

More information

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition

More information

arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v1 [cs.cv] 6 Jul 2016 arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

Lab meeting (Paper review session) Stacked Generative Adversarial Networks Lab meeting (Paper review session) Stacked Generative Adversarial Networks 2017. 02. 01. Saehoon Kim (Ph. D. candidate) Machine Learning Group Papers to be covered Stacked Generative Adversarial Networks

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Anurag Arnab and Philip H.S. Torr University of Oxford {anurag.arnab, philip.torr}@eng.ox.ac.uk 1. Introduction

More information

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing 3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

A Bi-directional Message Passing Model for Salient Object Detection

A Bi-directional Message Passing Model for Salient Object Detection A Bi-directional Message Passing Model for Salient Object Detection Lu Zhang, Ju Dai, Huchuan Lu, You He 2, ang Wang 3 Dalian University of Technology, China 2 Naval Aviation University, China 3 Alibaba

More information

Boundary-aware Fully Convolutional Network for Brain Tumor Segmentation

Boundary-aware Fully Convolutional Network for Brain Tumor Segmentation Boundary-aware Fully Convolutional Network for Brain Tumor Segmentation Haocheng Shen, Ruixuan Wang, Jianguo Zhang, and Stephen J. McKenna Computing, School of Science and Engineering, University of Dundee,

More information

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh National Institute of Advanced Industrial Science and Technology (AIST) Tsukuba,

More information

SINGLE image super-resolution (SR) aims to reconstruct

SINGLE image super-resolution (SR) aims to reconstruct Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang 1 arxiv:1710.01992v3 [cs.cv] 9 Aug 2018 Abstract Convolutional

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Elastic Neural Networks for Classification

Elastic Neural Networks for Classification Elastic Neural Networks for Classification Yi Zhou 1, Yue Bai 1, Shuvra S. Bhattacharyya 1, 2 and Heikki Huttunen 1 1 Tampere University of Technology, Finland, 2 University of Maryland, USA arxiv:1810.00589v3

More information

Generative Adversarial Network-Based Restoration of Speckled SAR Images

Generative Adversarial Network-Based Restoration of Speckled SAR Images Generative Adversarial Network-Based Restoration of Speckled SAR Images Puyang Wang, Student Member, IEEE, He Zhang, Student Member, IEEE and Vishal M. Patel, Senior Member, IEEE Abstract Synthetic Aperture

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

arxiv: v1 [cs.cv] 22 Feb 2017

arxiv: v1 [cs.cv] 22 Feb 2017 Synthesising Dynamic Textures using Convolutional Neural Networks arxiv:1702.07006v1 [cs.cv] 22 Feb 2017 Christina M. Funke, 1, 2, 3, Leon A. Gatys, 1, 2, 4, Alexander S. Ecker 1, 2, 5 1, 2, 3, 6 and Matthias

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting

CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting Vishwanath A. Sindagi Vishal M. Patel Department of Electrical and Computer Engineering, Rutgers University

More information

Introduction. Prior work BYNET: IMAGE SUPER RESOLUTION WITH A BYPASS CONNECTION NETWORK. Bjo rn Stenger. Rakuten Institute of Technology

Introduction. Prior work BYNET: IMAGE SUPER RESOLUTION WITH A BYPASS CONNECTION NETWORK. Bjo rn Stenger. Rakuten Institute of Technology BYNET: IMAGE SUPER RESOLUTION WITH A BYPASS CONNECTION NETWORK Jiu Xu Yeongnam Chae Bjo rn Stenger Rakuten Institute of Technology ABSTRACT This paper proposes a deep residual network, ByNet, for the single

More information

SINGLE image super-resolution (SR) aims to reconstruct

SINGLE image super-resolution (SR) aims to reconstruct Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang 1 arxiv:1710.01992v2 [cs.cv] 11 Oct 2017 Abstract Convolutional

More information

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors [Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors Junhyug Noh Soochan Lee Beomsu Kim Gunhee Kim Department of Computer Science and Engineering

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

arxiv: v1 [cs.cv] 6 Sep 2018

arxiv: v1 [cs.cv] 6 Sep 2018 arxiv:1809.01890v1 [cs.cv] 6 Sep 2018 Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks Koichi Hamada, Kentaro Tachibana, Tianqi Li, Hiroto

More information

Image Restoration with Deep Generative Models

Image Restoration with Deep Generative Models Image Restoration with Deep Generative Models Raymond A. Yeh *, Teck-Yian Lim *, Chen Chen, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do Department of Electrical and Computer Engineering, University

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

A Deep Learning Approach to Vehicle Speed Estimation

A Deep Learning Approach to Vehicle Speed Estimation A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,

More information