Video Frame Interpolation Using Recurrent Convolutional Layers

Size: px
Start display at page:

Download "Video Frame Interpolation Using Recurrent Convolutional Layers"

Transcription

1 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM) Video Frame Interpolation Using Recurrent Convolutional Layers Zhifeng Zhang 1, Li Song 1,2, Rong Xie 2, Li Chen 1 1 Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University 2 Cooperative Medianet Innovation Center, Shanghai, China {maplezzf, song li, xierong, hilichen}@sjtu.edu.cn Abstract Frame interpolation attempts to generate intermediate frames given existing ones, which is challenging because of complex video scenes and motion. Standard methods first estimate motion between two continuous frames and then synthesize new ones. In this paper, we propose a novel frame interpolation method based on a video synthesis approach deep voxel flow (DVF). In DVF, a deep convolutional encoder-decoder predicts 3D voxel flow, and then a volume sampling layer synthesizes the intermediate frame guided by the flow. To improve the accuracy of voxel flow, we employ recurrent convolutional layers (RCL) in the encoder-decoder module to refine the flow step by step, called DVF-RCL. We also incorporate perceptual loss to increase the visual quality. Experiments demonstrate that our method greatly improves the performance of original DVF and produce results that compare favorably to state-of-the-art methods both quantitatively and qualitatively. Index Terms frame interpolation, recurrent convolutional layers, deep voxel flow, video processing I. INTRODUCTION Video frame interpolation is a classic problem in computer vision and video processing field. It attempts to synthesize one or more intermediate frames from existing ones, which is widely used in many applications. For example, increasing video frame rate is required in video transcoding system to improve visual quality [1]. In addition, dropping video frames is often used in video transmission due to limited bandwidth at the emitting end and it is necessary to recover them via interpolation at the receiving end [2]. Traditional frame interpolation methods usually takes two steps, i.e., motion estimation between adjacent frames, and pixel synthesis guided by the motion. However, the performance of these methods relies heavily on the accuracy of optical flow, which is hard to be estimated in regions with occlusion, motion blur, large displacement, and abrupt changes in lighting [3]. In recent years, deep convolutional neural network (CNN) has proved its remarkable performance in many computer vision problems. For example, CNN-based methods have set up new state-of-the-art results in many high-level vision tasks, such as image classification [4] and object detection [5]. Besides, impressive results are also produced in image superresolution [6] and some other low-level vision problems [7] by utilizing deep learning approaches. More recently, optical flow estimation have been solved as a supervised learning problem with their ground truth [8], [9]. CNN-based approaches are also promising for frame synthesis using end-to-end models, (a) Ground truth (b) DVF [10] (c) Ours-L 1 (d) Ours-L F Fig. 1. Visual example of frame interpolation. Compared to original DVF [10] (b), our proposed method produce visually more pleasing results, especially the one with perceptual loss L F (d). while improvement is still needed to better handle large displacement and generate visually more pleasing results. In this paper, we present a novel method for video frame interpolation problem. Our approach is an end-to-end network based on a frame synthesis method Deep Voxel Flow (DVF) [10]. Two parts are included in the model: a convolutional encoder-decoder predicts 3D voxel flow, and then a volume sampling layer synthesizes the intermediate frame guided by the flow. Especially, in the decoder part, recurrent convolutional layers (RCL) [11] are used to progressively leverage feature maps from lower-level convolutional layers and refine the voxel flow step by step. It recurrently generates a refined voxel flow with 2 resolution until the resolution equals the original input. This way we can obtain more accurate flow estimation compared to original DVF. Besides, perceptual loss functions are incorporated to further improve the visual quality of synthesized frames, as illustrated in Fig. 1. To our best knowledge, we firstly utilize RCL for video frame interpolation and achieve state-of-the-art results on UCF-101 test set [12]. The rest of this paper is organized as follows: Section II introduces the related works of video frame interpolation. We describe the details of our proposed method in Section III. The /18/$ IEEE

2 I 0 I : convolution layer : average pooling layer : bilinear upsampling layer : volume sampling layer : recurrent connection : concatenation connection : recurrent convolutional layer Fig. 2. Architecture of our proposed method DVF-RCL. Given input frames I 0 and I 1, voxel flow is predicted by a convolutional encoder-decoder network and then used to synthesize the interpolated frame via a volume sampling layer. Recurrent convolutional layers are used in the decoder part to refine the voxel flow step-by-step. experimental results are given in Section IV. Finally, Section V draws a conclusion. II. RELATED WORK Video frame interpolation is one of the basic video processing technologies, which attempts to synthesize intermediate frames given existing ones. Common frame interpolation approaches first estimate dense motion, especially optical flow, between continuous input frames and then generate one or more middle frames based on the estimated motion [13]. Multiple methods can be used for motion estimation, including traditional motion-compensated methods [14] and recent neural-network-based methods [8]. However, flow amplitudes vary a lot from slow motion to large displacement, which is challenging to be predicted accurately. Given estimated optical flow between two continuous frames, an intermediate frame can be synthesized by projecting pixel values bidirectionally from these two frames. The details of a standard interpolation algorithm can be found in [13]. Therefore, the quality of the synthesized frames relies on the accuracy of optical flow and interpolation algorithms. Different from these flow-based methods, Mahajan et al. developed a path framework to copy pixel gradients from input images to the interpolated frame via a Poisson reconstruction [15]. Meyer et al. also presented a novel phase-based technique for video frame interpolation [16]. Although this method often generates impressive results, further improvement is still required to handle larger motions and maintain high-frequency details. Recently, convolutional neural networks have achieved state-of-the-art results in many computer vision tasks. They can also be applied to optical flow estimation [8], [9]. However, these approaches require supervision, i.e., optical flow ground-truth, which is difficult to obtain. Long et al. applied a convolutional neural network for frame interpolation and then inverted the learned CNN; however, their method generates the interpolated frame as an intermediate step and their end-goal is optical flow estimation [17]. Zhou et al. trained a CNN to predict appearance flow and then reconstructed novel views using this estimation [18]. Their method can produce a frame between input ones by warping individual input views using the appearance flows. There are a number of papers that directly generate images or videos using CNNs. Mathieu et al. presented a multi-scale network for video prediction using a gradient difference loss function and adversarial training method; but containing artifacts and blurriness is still a problem for this method [19]. Liu et al. combined the strengths of optical-flow-based and neuralnetwork-based methods [10]. They predicted 3D voxel flow using a deep fully differentiable network and then synthesized new frames by flowing pixel values from existing ones, but the estimated frames are still not visually satisfying. Niklaus et al. considered pixel synthesis for frame interpolation as local convolution between inputs and employed a deep CNN to estimate spatially-adaptive convolution 2D kernels for each pixel to capture both local motion and coefficients [3]. Then a new frame can be generated by convolving the kernels with the input frames; but the memory demand increases quadratically with the kernel size. Their extended work approximate a 2D kernel with a pair of 1D kernels to solve this problem because much less parameters are required [20]. When handling large motion, however, memory requirement is still a problem for this adaptive separable convolution method. III. PROPOSED METHOD The method proposed is based on an end-to-end fully differentiable CNN framework that generates intermediate frames directly from input frames, which is referred as Deep Voxel Flow (DVF) [10]. Starting from this baseline model, we describe our proposed method that utilizes recurrent convolutional layers (RCL) [11] to predict the voxel flow step-by-step and improve the quality of synthesized frames, which is called DVF-RCL here.

3 A. Network Architecture 1) Deep Voxel Flow: We first briefly describe our baseline model DVF and define notations. DVF predicts the 3D voxel flow using a convolutional encoder-decoder and then synthesizes the desired frame by a volume sampling layer. Let us assume the 3D voxel flow field as F = { x, y, t}. Specifically, we can seperate F into F motion = { x, y} and F mask = { t}. The spatial component F motion of voxel flow F represents the optical flow from the intermediate frame to the next frame and the temporal component F mask serves as sampling weights for trilinear interpolation. The interpolated frame is synthesized using the voxel flow F : Î = t I 0 ( x, y) + (1 t) I 1 ( x, y) (1) where I 0 ( x, y) and I 1 ( x, y) represent the resampled frames guided by the voxel flow F respectively and denotes the Hadamard product. This method is presented in [10] for video frame synthesis, which combines the advantages of flow-based and CNN-based methods and merges two steps of flow-based methods into a single process. The volume sampling layer only performs spatial transformation and contains no learnable parameters. Hence the performance of DVF is closely related to the encoder-decoder part, which decides the accuracy of voxel flow directly. 2) Recurrent Convolutional Layer: Recurrent convolutional layer (RCL) is proposed in [11] for object recognition and also used for video saliency estimation [21] and object segmentation [22] tasks. The basic idea is to add recurrent connections along time axis within every convolutional layer of the feed-forward CNN. This structure enables the units to be modulated by other units in the same layer, which leads to larger effective receptive field of one layer when time increases. The recurrent connections increases the depth of the original CNN without adding network parameters by weight sharing between layers [11]. Therefore, stacks of RCL can help leverage local information to refine details of voxel flow step by step. Fig. 2 illustrates the framework of our proposed DVF-RCL. Same as original DVF, an encoder-decoder network predicts the 3D voxel flow and a volume sampling layer synthesizes the intermediate frame. In encoder part, we choose the encoder module from SepConv [20] for its strong capability to extract motion features from input frames. Specifically, we use stacks of three 3 3 convolution layers with ReLU activation and downsample the feature maps 5 times. In decoder part, recurrent convolutional layers are employed to fuse voxel flow and feature maps. Initial voxel flow with coarse resolution is 2 upsampled and then concatenated with corresponding lowerlevel feature maps from the encoder part as the input of RCL. Refined voxel flow can be obtained from the last convolutional layer. In this way, we can get the full image resolution voxel flow progressively from coarser scale to better one. In addition, we choose 3 3 convolution kernel size in both feed-forward and recurrent connections and feature maps are generated Initial voxel flow RCL Lower-level feature maps n+3 t=0 Unfolding the RCL t=1 t=2 Feed-Forward Connection t=3 Recurrent Connection Refined voxel flow Fig. 3. The detailed framework of RCL. The RCL is unfolded along with 3 time steps in the blue box. Note that n represents the number of lower-level feature maps from the encoder part. for each time step in RCL. We replace the local response normalization (LRN) with batch normalization [23] for better interpolated results and other parameters are all the same as in [11]. Fig. 3 presents the unfolded RCL with 3 time steps. The upsampled initial voxel flow is concatenated with lower-level feature maps from corresponding encoder part to compose the initial input of the RCL. In each time step, RCL will receive inputs from both feed-forward connection and recurrent connection. Then the outputs can be passed to next time step. In [11], they found that 3 time steps work best for their object detection model but it is not verified for other problems. Therefore, we attempt RCL of different time steps in the experiment to find the best hyper-parameter setting for our model. B. Network Training Following [20], two types of loss functions have been considered for the training of our model. We first use l 1 norm reconstruction loss based on the pixel-wise color difference between the synthesized frame Î and the ground truth I gt, which can be defined as below, L 1 = Î I gt 1 (2) Additionally, l 2 norm loss function can be used but we found it led to blurry results compared to l 1 norm, which is in line with the findings from [20]. However, the ability of l 1 norm loss function to capture perceptual difference, such as high-frequency details, is very limited as they are defined based on pixel-wise color difference. Another type of loss function is perceptual loss, which has been found effective in artistic style-transfer [24] and image super-resolution [25] tasks. Perceptual loss can generate visually more appealing results than pixel-wise distance metrics. Therefore, we employ high level features from ReLU 5 4 layer of the pre-trained VGG-19 network [14] as a perceptual loss. The distance metric is therefore defined as follows, L F = Î I gt 1 + λ vgg φ(î) φ(i gt) 2 2 (3) where φ(î) and φ(i gt) are high level features of images from VGG network and we empirically set λ vgg as in this work.

4 PSNR Time steps Fig. 4. PSNR results of different time steps in RCL. Note that at t = 0 only feed-forward computation takes place. TABLE I COMPARISON OF STATE-OF-THE-ART METHODS Ground truth SepConv-L 1 [20] SepConv-L F [20] Method PSNR SSIM FlowNet2 [9] MDP-Flow2 [27] Meyer et al. [16] DVF [10] SepConv-L 1 [20] SepConv-L F [20] Ours-L Ours-L F DVF [10] Ours-L 1 Ours-L F A. Datasets IV. EXPERIMENTS The dataset used is from the UCF-101 training set [12], including videos that belongs to 101 action categories. It is a standard action recognition dataset with various video scenes and motion. Training samples were extracted by randomly cropping patches from three continuous frames. We selected samples with obvious motion from the extracted 3 triplets, discarding samples that are temporally close to each other. To compute the motion, DeepFlow2 [26] was used to predict the optical flow between frames. Overall, we randomly selected 100,000 samples from extracted triplets to compose our training dataset. Following [10], UCF-101 test set is chosen as a benchmark for evaluation. We use both PSNR and SSIM metrics to evaluate the quality on the luminance channel of images. B. Parameter Settings All convolutional layers are initialized using Xavier [28] method in our proposed model. We set the learning rate as and trained the network using Adam [29] optimizer with β 1 = 0.9 and β 2 = Same as [20], a small mini-batch size of 16 samples was selected because we found using more samples per mini-batch led to quality degradation. With an NVIDIA GeForce GTX 1080 GPU, our proposed framework was implemented on PyTorch [30] platform for both the training and testing process. C. Analysis of Recurrent Convolutional Layer To find the best hyper-parameter setting, we trained our model by utilizing RCL with different time steps. Fig. 4 illustrates the PSNR results of time steps from 0 to 5. Note that at t = 0 only feed-forward computation takes place and the model Fig. 5. Visual example of frame interpolation results. We compare our method using different loss functions to original DVF [10] and SepConv [20]. degrades into a plain version without recurrent connections. We find that DVF-RCL with 3 time steps performs better than less time steps. The effective receptive field of a RCL unit in the feature maps expands when the time step increases [22]. Hence employing RCL in the model can progressively refine the voxel flow and increase the accuracy in the end. However, using 4 or more time steps achieves no more gain and even decays a little. Therefore, we set time steps of a RCL to be 3 in our final model. As shown in Fig. 5, we compare the visual examples between original DVF and our DVF-RCL using recurrent convolutional layers. Results demonstrate that our DVF-RCL greatly improves the visual quality of original model. This example shows a scene with large motion that is hard to be estimated. The interpolated frame from DVF tends to be blurry due to inaccurate voxel flow estimation. On the contrast, our proposed method is able to predict more accurate voxel flow and produce visually pleasing results. Compared to SepConv with two different loss functions, our method also performs better in regions with large displacement. D. Perceptual Loss Two types of loss functions are considered in our model, i.e., pixel-wise loss L 1 and perceptual loss L F. It is wellknown that only pixel-wise distance metrics often produce over-smoothing results and combining the perceptual loss can help improve visual quality. Therefore, two versions are trained for our DVF-RCL model. We first trained the model

5 Ground truth Ours-L F FlowNet2 [9] MDP-Flow2 [27] Meyer et al. [16] DVF [10] SepConv-L F [20] Fig. 6. Visual comparisons among different methods. with L 1 loss and then L F loss was used. As listed in Table I, we compare the quantitative results of our model with different loss functions. The PSNR result of L F loss is relatively lower than single pixel-wise distance loss function L 1. However, we found that incorporating L F loss leads to shaper frames, as shown in Fig. 1 and Fig. 5. This is in line with the findings from recent work [20]. We also compare our results to two versions of SepConv L 1 and L F in Fig. 5. Same as our findings, SepConv L F with perceptual loss produces sharper results than SepConv L 1. However, our DVF-RCL L F can generate visually more pleasing results than SepConv L F. In Fig. 5, for example, it depicts a scene of horse racing with large motion. Our method can produce perceptually more satisfying frame, especially in the regions of horse s legs. It demonstrates that our model can handle large displacement properly. E. Comparison with State-of-the-art In this section, several state-of-the-art frame interpolation methods are compared with our proposed model. We first consider two flow-based techniques FlowNet2 [9] and MDP- Flow2 [27]. The interpolation algorithm described in [13] is applied to generate the intermediate frames given optical flow. We also compare our model with the phase-based interpolation method [16]. For neural-network-based approaches, our baseline model DVF [10] and SepConv [20] are both considered. Besides, we include results from SepConv with both L 1 loss and L F loss for a complete comparison. As shown in Table I, our proposed DVF-RCL L 1 is the best performing method in terms of PSNR numerical metric. Fig. 6 illustrates the visual examples of these different methods. Notice that original DVF often generates blurry results with artifacts. On the contrast, our DVF-RCL L F is able to produce visually pleasing frames competing to SepConv L F that is also trained with perceptual losses. With respect to the computational complexity, our model performs better than SepConv with comparable results. For example, Sepconv takes 0.98 seconds to interpolate a frame and 0.51 seconds to interpolate a frame with an NVIDIA GeForce GTX 1080 GPU. On the contrast, our model is able to synthesize a frame in 0.52 seconds as well as a frame in 0.23 seconds, which is about 2 times faster than SepConv. Memory demand is another relevant difference between SepConv and our DVF-RCL. For example, our method only need to estimate 3D voxel flow with 23MB for frame interpolation. However, 1.27GB memory [20] is required for SepConv to generate kernels for a 1080p video frame. Handling larger motion will consume even more memory with the kernel size increases, which may become a limitation in practical applications.

6 F. Discussion Our method currently generates an intermediate frame at t = 0.5 between two continuous input frames. We can also train a multi-step model like DVF to interpolate multiple frames simultaneously. However, it is not flexible to produce a frame at an arbitrary time. One possible solution is to combine the desired temporal step with the inputs, which is also mentioned in [20]. Furthermore, we can extract training samples with different time intervals to improve the variety of frame rates, which will enable our model to handle videos with a larger range of motion. V. CONCLUSION In this paper, we have presented a novel frame interpolation approach, called DVF-RCL. Based on a video synthesis method DVF, recurrent convolutional layers (RCL) are employed in the encoder-decoder module to estimate voxel flow step by step to increase the accuracy. We also use the perceptual loss to further improve the visual quality of synthesized frames. Our experiments show that this approach greatly improves the performance of original DVF from both quantitative and qualitative perspectives and produces high-quality results comparable to state-of-the-art interpolation methods. ACKNOWLEDGMENT This work was supported by NSFC ( , U and ), the 111 Project (B07022 and Sheitc No ), Natural Science Foundation of Shanghai (18ZR ) and the Shanghai Key Laboratory of Digital Media Processing and Transmissions. REFERENCES [1] U. S. Kim and M. H. Sunwoo, New frame rate up-conversion algorithms with low computational complexity, IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 3, pp , [2] S. Sekiguchi, Y. Idehara, K. Sugimoto, and K. Asai, A low-cost video frame-rate up conversion using compressed-domain information, in IEEE International Conference on Image Processing (ICIP), vol. 2, 2005, pp. II 974. [3] S. Niklaus, L. Mai, and F. Liu, Video frame interpolation via adaptive convolution, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, no. 4, 2017, p. 6. [4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (NIPS), 2012, pp [5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp [6] C. Dong, C. C. Loy, K. He, and X. Tang, Learning a deep convolutional network for image super-resolution, in European Conference on Computer Vision (ECCV), 2014, pp [7] J. Xie, L. Xu, and E. Chen, Image denoising and inpainting with deep neural networks, in Advances in Neural Information Processing Systems (NIPS), 2012, pp [8] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. van der Smagt, D. Cremers, and T. Brox, Flownet: Learning optical flow with convolutional networks, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp [9] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, Flownet 2.0: Evolution of optical flow estimation with deep networks, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, [10] Z. Liu, R. Yeh, X. Tang, Y. Liu, and A. Agarwala, Video frame synthesis using deep voxel flow, in IEEE International Conference on Computer Vision (ICCV), vol. 2, [11] M. Liang and X. Hu, Recurrent convolutional neural network for object recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp [12] K. Soomro, A. R. Zamir, and M. Shah, UCF101: A dataset of 101 human actions classes from videos in the wild, in CRCV-TR-12-01, [13] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black, and R. Szeliski, A database and evaluation methodology for optical flow, International Journal of Computer Vision (IJCV), vol. 92, no. 1, pp. 1 31, [14] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, High accuracy optical flow estimation based on a theory for warping, in European Conference on Computer Vision (ECCV), 2004, pp [15] D. Mahajan, F.-C. Huang, W. Matusik, R. Ramamoorthi, and P. Belhumeur, Moving gradients: a path-based method for plausible image interpolation, in ACM Transactions on Graphics (TOG), vol. 28, no. 3, 2009, p. 42. [16] S. Meyer, O. Wang, H. Zimmer, M. Grosse, and A. Sorkine-Hornung, Phase-based frame interpolation for video, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp [17] G. Long, L. Kneip, J. M. Alvarez, H. Li, X. Zhang, and Q. Yu, Learning image matching by simply watching video, in European Conference on Computer Vision (ECCV), 2016, pp [18] T. Zhou, S. Tulsiani, W. Sun, J. Malik, and A. A. Efros, View synthesis by appearance flow, in European conference on computer vision (ECCV), 2016, pp [19] M. Mathieu, C. Couprie, and Y. LeCun, Deep multi-scale video prediction beyond mean square error, arxiv preprint arxiv: , [20] S. Niklaus, L. Mai, and F. Liu, Video frame interpolation via adaptive separable convolution, in IEEE International Conference on Computer Vision (ICCV), [21] X. Wei, L. Song, R. Xie, and W. Zhang, Two-stream recurrent convolutional neural networks for video saliency estimation, in IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), 2017, pp [22] J. Xu, L. Song, and R. Xie, Two-stream deep encoder-decoder architecture for fully automatic video object segmentation, in IEEE Visual Communications and Image Processing (VCIP), 2017, pp [23] S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arxiv preprint arxiv: , [24] J. Johnson, A. Alahi, and L. Fei-Fei, Perceptual losses for realtime style transfer and super-resolution, in European Conference on Computer Vision (ECCV), 2016, pp [25] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., Photo-realistic single image super-resolution using a generative adversarial network, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [26] P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, Deepflow: Large displacement optical flow with deep matching, in IEEE International Conference on Computer Vision (ICCV), 2013, pp [27] L. Xu, J. Jia, and Y. Matsushita, Motion detail preserving optical flow estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 34, no. 9, pp , [28] X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, pp [29] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arxiv preprint arxiv: , [30] A. Paszke, S. Gross, S. Chintala, and G. Chanan, Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration, 2017.

MOTION ESTIMATION USING CONVOLUTIONAL NEURAL NETWORKS. Mustafa Ozan Tezcan

MOTION ESTIMATION USING CONVOLUTIONAL NEURAL NETWORKS. Mustafa Ozan Tezcan MOTION ESTIMATION USING CONVOLUTIONAL NEURAL NETWORKS Mustafa Ozan Tezcan Boston University Department of Electrical and Computer Engineering 8 Saint Mary s Street Boston, MA 2215 www.bu.edu/ece Dec. 19,

More information

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Xintao Wang Ke Yu Chao Dong Chen Change Loy Problem enlarge 4 times Low-resolution image High-resolution image Previous

More information

arxiv: v1 [cs.cv] 16 Nov 2017

arxiv: v1 [cs.cv] 16 Nov 2017 Frame Interpolation with Multi-Scale Deep Loss Functions and Generative Adversarial Networks arxiv:1711.06045v1 [cs.cv] 16 Nov 2017 Joost van Amersfoort, Wenzhe Shi, Alejandro Acosta, Francisco Massa,

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

arxiv: v1 [cs.cv] 30 Nov 2017

arxiv: v1 [cs.cv] 30 Nov 2017 Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation Huaizu Jiang 1 Deqing Sun 2 Varun Jampani 2 Ming-Hsuan Yang 3,2 Erik Learned-Miller 1 Jan Kautz 2 1 UMass Amherst

More information

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Jiahao Pang 1 Wenxiu Sun 1 Chengxi Yang 1 Jimmy Ren 1 Ruichao Xiao 1 Jin Zeng 1 Liang Lin 1,2 1 SenseTime Research

More information

DCGANs for image super-resolution, denoising and debluring

DCGANs for image super-resolution, denoising and debluring DCGANs for image super-resolution, denoising and debluring Qiaojing Yan Stanford University Electrical Engineering qiaojing@stanford.edu Wei Wang Stanford University Electrical Engineering wwang23@stanford.edu

More information

A Deep Learning Approach to Vehicle Speed Estimation

A Deep Learning Approach to Vehicle Speed Estimation A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

arxiv: v1 [cs.cv] 25 Dec 2017

arxiv: v1 [cs.cv] 25 Dec 2017 Deep Blind Image Inpainting Yang Liu 1, Jinshan Pan 2, Zhixun Su 1 1 School of Mathematical Sciences, Dalian University of Technology 2 School of Computer Science and Engineering, Nanjing University of

More information

Image Super-Resolution Using Dense Skip Connections

Image Super-Resolution Using Dense Skip Connections Image Super-Resolution Using Dense Skip Connections Tong Tong, Gen Li, Xiejie Liu, Qinquan Gao Imperial Vision Technology Fuzhou, China {ttraveltong,ligen,liu.xiejie,gqinquan}@imperial-vision.com Abstract

More information

Context-aware Synthesis for Video Frame Interpolation

Context-aware Synthesis for Video Frame Interpolation Context-aware Synthesis for Video Frame Interpolation Simon Niklaus Portland State University sniklaus@pdx.edu Feng Liu Portland State University fliu@cs.pdx.edu arxiv:1803.10967v1 [cs.cv] 29 Mar 2018

More information

arxiv: v1 [cs.cv] 22 Jan 2016

arxiv: v1 [cs.cv] 22 Jan 2016 UNSUPERVISED CONVOLUTIONAL NEURAL NETWORKS FOR MOTION ESTIMATION Aria Ahmadi, Ioannis Patras School of Electronic Engineering and Computer Science Queen Mary University of London Mile End road, E1 4NS,

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1, Wei-Chen Chiu 2, Sheng-De Wang 1, and Yu-Chiang Frank Wang 1 1 Graduate Institute of Electrical Engineering,

More information

End-to-End Learning of Video Super-Resolution with Motion Compensation

End-to-End Learning of Video Super-Resolution with Motion Compensation End-to-End Learning of Video Super-Resolution with Motion Compensation Osama Makansi, Eddy Ilg, and Thomas Brox Department of Computer Science, University of Freiburg Abstract. Learning approaches have

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION 2017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 25 28, 2017, TOKYO, JAPAN DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1,

More information

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon 3D Densely Convolutional Networks for Volumetric Segmentation Toan Duc Bui, Jitae Shin, and Taesup Moon School of Electronic and Electrical Engineering, Sungkyunkwan University, Republic of Korea arxiv:1709.03199v2

More information

Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks

Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks Wei-Sheng Lai 1 Jia-Bin Huang 2 Ming-Hsuan Yang 1,3 1 University of California, Merced 2 Virginia Tech 3 Nvidia Research 1

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Kihyuk Sohn 1 Sifei Liu 2 Guangyu Zhong 3 Xiang Yu 1 Ming-Hsuan Yang 2 Manmohan Chandraker 1,4 1 NEC Labs

More information

Video Frame Interpolation via Adaptive Separable Convolution

Video Frame Interpolation via Adaptive Separable Convolution Video Frame Interpolation via Adaptive Separable Convolution Simon Niklaus Portland State University sniklaus@pdx.edu Long Mai Portland State University mtlong@cs.pdx.edu Feng Liu Portland State University

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Supplemental Material for End-to-End Learning of Video Super-Resolution with Motion Compensation

Supplemental Material for End-to-End Learning of Video Super-Resolution with Motion Compensation Supplemental Material for End-to-End Learning of Video Super-Resolution with Motion Compensation Osama Makansi, Eddy Ilg, and Thomas Brox Department of Computer Science, University of Freiburg 1 Computation

More information

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh National Institute of Advanced Industrial Science and Technology (AIST) Tsukuba,

More information

arxiv: v1 [cs.cv] 3 Jul 2017

arxiv: v1 [cs.cv] 3 Jul 2017 End-to-End Learning of Video Super-Resolution with Motion Compensation Osama Makansi, Eddy Ilg, and Thomas Brox Department of Computer Science, University of Freiburg arxiv:1707.00471v1 [cs.cv] 3 Jul 2017

More information

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Supplementary Material

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Supplementary Material Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Supplementary Material Xintao Wang 1 Ke Yu 1 Chao Dong 2 Chen Change Loy 1 1 CUHK - SenseTime Joint Lab, The Chinese

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

Efficient Module Based Single Image Super Resolution for Multiple Problems

Efficient Module Based Single Image Super Resolution for Multiple Problems Efficient Module Based Single Image Super Resolution for Multiple Problems Dongwon Park Kwanyoung Kim Se Young Chun School of ECE, Ulsan National Institute of Science and Technology, 44919, Ulsan, South

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Introduction. Prior work BYNET: IMAGE SUPER RESOLUTION WITH A BYPASS CONNECTION NETWORK. Bjo rn Stenger. Rakuten Institute of Technology

Introduction. Prior work BYNET: IMAGE SUPER RESOLUTION WITH A BYPASS CONNECTION NETWORK. Bjo rn Stenger. Rakuten Institute of Technology BYNET: IMAGE SUPER RESOLUTION WITH A BYPASS CONNECTION NETWORK Jiu Xu Yeongnam Chae Bjo rn Stenger Rakuten Institute of Technology ABSTRACT This paper proposes a deep residual network, ByNet, for the single

More information

Spatio-Temporal Transformer Network for Video Restoration

Spatio-Temporal Transformer Network for Video Restoration Spatio-Temporal Transformer Network for Video Restoration Tae Hyun Kim 1,2, Mehdi S. M. Sajjadi 1,3, Michael Hirsch 1,4, Bernhard Schölkopf 1 1 Max Planck Institute for Intelligent Systems, Tübingen, Germany

More information

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization Presented by: Karen Lucknavalai and Alexandr Kuznetsov Example Style Content Result Motivation Transforming content of an image

More information

Video Frame Interpolation via Adaptive Convolution

Video Frame Interpolation via Adaptive Convolution Video Frame Interpolation via Adaptive Convolution Simon Niklaus Portland State University sniklaus@pdx.edu Long Mai Portland State University mtlong@cs.pdx.edu Feng Liu Portland State University fliu@cs.pdx.edu

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

Fast and Accurate Image Super-Resolution Using A Combined Loss

Fast and Accurate Image Super-Resolution Using A Combined Loss Fast and Accurate Image Super-Resolution Using A Combined Loss Jinchang Xu 1, Yu Zhao 1, Yuan Dong 1, Hongliang Bai 2 1 Beijing University of Posts and Telecommunications, 2 Beijing Faceall Technology

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition

More information

EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis Supplementary

EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis Supplementary EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis Supplementary Mehdi S. M. Sajjadi Bernhard Schölkopf Michael Hirsch Max Planck Institute for Intelligent Systems Spemanstr.

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

arxiv: v4 [cs.cv] 25 Mar 2018

arxiv: v4 [cs.cv] 25 Mar 2018 Frame-Recurrent Video Super-Resolution Mehdi S. M. Sajjadi 1,2 msajjadi@tue.mpg.de Raviteja Vemulapalli 2 ravitejavemu@google.com 1 Max Planck Institute for Intelligent Systems 2 Google Matthew Brown 2

More information

Supplementary Material for "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks"

Supplementary Material for FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks Supplementary Material for "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks" Architecture Datasets S short S long S fine FlowNetS FlowNetC Chairs 15.58 - - Chairs - 14.60 14.28 Things3D

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017 Convolutional Neural Networks + Neural Style Transfer Justin Johnson 2/1/2017 Outline Convolutional Neural Networks Convolution Pooling Feature Visualization Neural Style Transfer Feature Inversion Texture

More information

Two-Stream Recurrent Convolutional Neural Networks for Video Saliency Estimation

Two-Stream Recurrent Convolutional Neural Networks for Video Saliency Estimation Two-Stream Recurrent Convolutional Neural Networks for Video Saliency Estimation Xiao Wei 1,2, Li Song 1,2, Rong Xie 1,2, Wenjun Zhang 1,2 1 Institute of Image Communication and Network Engineering, Shanghai

More information

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models [Supplemental Materials] 1. Network Architecture b ref b ref +1 We now describe the architecture of the networks

More information

Controllable Generative Adversarial Network

Controllable Generative Adversarial Network Controllable Generative Adversarial Network arxiv:1708.00598v2 [cs.lg] 12 Sep 2017 Minhyeok Lee 1 and Junhee Seok 1 1 School of Electrical Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul,

More information

arxiv: v1 [cs.cv] 20 Jul 2018

arxiv: v1 [cs.cv] 20 Jul 2018 arxiv:1807.07930v1 [cs.cv] 20 Jul 2018 Photorealistic Video Super Resolution Eduardo Pérez-Pellitero 1, Mehdi S. M. Sajjadi 1, Michael Hirsch 2, and Bernhard Schölkopf 12 1 Max Planck Institute for Intelligent

More information

Learning visual odometry with a convolutional network

Learning visual odometry with a convolutional network Learning visual odometry with a convolutional network Kishore Konda 1, Roland Memisevic 2 1 Goethe University Frankfurt 2 University of Montreal konda.kishorereddy@gmail.com, roland.memisevic@gmail.com

More information

EasyChair Preprint. Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network

EasyChair Preprint. Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network EasyChair Preprint 730 Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network Xue Bai, Enqing Chen and Haron Chweya Tinega EasyChair preprints are intended for rapid dissemination

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

Unsupervised Learning of Stereo Matching

Unsupervised Learning of Stereo Matching Unsupervised Learning of Stereo Matching Chao Zhou 1 Hong Zhang 1 Xiaoyong Shen 2 Jiaya Jia 1,2 1 The Chinese University of Hong Kong 2 Youtu Lab, Tencent {zhouc, zhangh}@cse.cuhk.edu.hk dylanshen@tencent.com

More information

RSRN: Rich Side-output Residual Network for Medial Axis Detection

RSRN: Rich Side-output Residual Network for Medial Axis Detection RSRN: Rich Side-output Residual Network for Medial Axis Detection Chang Liu, Wei Ke, Jianbin Jiao, and Qixiang Ye University of Chinese Academy of Sciences, Beijing, China {liuchang615, kewei11}@mails.ucas.ac.cn,

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

arxiv: v1 [cs.cv] 5 Jul 2017

arxiv: v1 [cs.cv] 5 Jul 2017 AlignGAN: Learning to Align Cross- Images with Conditional Generative Adversarial Networks Xudong Mao Department of Computer Science City University of Hong Kong xudonmao@gmail.com Qing Li Department of

More information

arxiv: v1 [cs.cv] 25 Feb 2019

arxiv: v1 [cs.cv] 25 Feb 2019 DD: Learning Optical with Unlabeled Data Distillation Pengpeng Liu, Irwin King, Michael R. Lyu, Jia Xu The Chinese University of Hong Kong, Shatin, N.T., Hong Kong Tencent AI Lab, Shenzhen, China {ppliu,

More information

Deconvolutions in Convolutional Neural Networks

Deconvolutions in Convolutional Neural Networks Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement

MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement 1 MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement Wenbo Bao, Wei-Sheng Lai, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang significant

More information

Quality Enhancement of Compressed Video via CNNs

Quality Enhancement of Compressed Video via CNNs Journal of Information Hiding and Multimedia Signal Processing c 2017 ISSN 2073-4212 Ubiquitous International Volume 8, Number 1, January 2017 Quality Enhancement of Compressed Video via CNNs Jingxuan

More information

ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems (Supplementary Materials)

ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems (Supplementary Materials) ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems (Supplementary Materials) Yinda Zhang 1,2, Sameh Khamis 1, Christoph Rhemann 1, Julien Valentin 1, Adarsh Kowdle 1, Vladimir

More information

CT Image Denoising with Perceptive Deep Neural Networks

CT Image Denoising with Perceptive Deep Neural Networks June 2017, Xi'an CT Image Denoising with Perceptive Deep Neural Networks Qingsong Yang, and Ge Wang Department of Biomedical Engineering Rensselaer Polytechnic Institute Troy, NY, USA Email: wangg6@rpi.edu

More information

Human Pose Estimation with Deep Learning. Wei Yang

Human Pose Estimation with Deep Learning. Wei Yang Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2 What do we need to know to recognize a crime scene? 3

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

A Novel Multi-Frame Color Images Super-Resolution Framework based on Deep Convolutional Neural Network. Zhe Li, Shu Li, Jianmin Wang and Hongyang Wang

A Novel Multi-Frame Color Images Super-Resolution Framework based on Deep Convolutional Neural Network. Zhe Li, Shu Li, Jianmin Wang and Hongyang Wang 5th International Conference on Measurement, Instrumentation and Automation (ICMIA 2016) A Novel Multi-Frame Color Images Super-Resolution Framewor based on Deep Convolutional Neural Networ Zhe Li, Shu

More information

Deep Learning for Visual Manipulation and Synthesis

Deep Learning for Visual Manipulation and Synthesis Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

An Attention-Based Approach for Single Image Super Resolution

An Attention-Based Approach for Single Image Super Resolution An Attention-Based Approach for Single Image Super Resolution Yuan Liu 1,2,3, Yuancheng Wang 1, Nan Li 1,Xu Cheng 4, Yifeng Zhang 1,2,3,, Yongming Huang 1, Guojun Lu 5 1 School of Information Science and

More information

arxiv: v1 [cs.cv] 11 Apr 2017

arxiv: v1 [cs.cv] 11 Apr 2017 Online Video Deblurring via Dynamic Temporal Blending Network Tae Hyun Kim 1, Kyoung Mu Lee 2, Bernhard Schölkopf 1 and Michael Hirsch 1 arxiv:1704.03285v1 [cs.cv] 11 Apr 2017 1 Department of Empirical

More information

Removing rain from single images via a deep detail network

Removing rain from single images via a deep detail network 207 IEEE Conference on Computer Vision and Pattern Recognition Removing rain from single images via a deep detail network Xueyang Fu Jiabin Huang Delu Zeng 2 Yue Huang Xinghao Ding John Paisley 3 Key Laboratory

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss AAAI 2018, New Orleans, USA Simon Meister, Junhwa Hur, and Stefan Roth Department of Computer Science, TU Darmstadt 2 Deep

More information

arxiv: v1 [cs.cv] 29 Sep 2016

arxiv: v1 [cs.cv] 29 Sep 2016 arxiv:1609.09545v1 [cs.cv] 29 Sep 2016 Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge Adrian Bulat and Georgios Tzimiropoulos Computer Vision

More information

arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v1 [cs.cv] 6 Jul 2016 arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks

More information

arxiv: v1 [cs.cv] 18 Dec 2018 Abstract

arxiv: v1 [cs.cv] 18 Dec 2018 Abstract SREdgeNet: Edge Enhanced Single Image Super Resolution using Dense Edge Detection Network and Feature Merge Network Kwanyoung Kim, Se Young Chun Ulsan National Institute of Science and Technology (UNIST),

More information

arxiv: v1 [cs.cv] 10 Apr 2017

arxiv: v1 [cs.cv] 10 Apr 2017 Detail-revealing Deep Video Super-resolution Xin ao Hongyun Gao Renjie iao Jue Wang Jiaya Jia he Chinese University of Hong Kong University of oronto Megvii Inc. arxiv:1704.02738v1 [cs.cv] 10 Apr 2017

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Light Field Super Resolution with Convolutional Neural Networks

Light Field Super Resolution with Convolutional Neural Networks Light Field Super Resolution with Convolutional Neural Networks by Andrew Hou A Thesis submitted in partial fulfillment of the requirements for Honors in the Department of Applied Mathematics and Computer

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation , pp.162-167 http://dx.doi.org/10.14257/astl.2016.138.33 A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation Liqiang Hu, Chaofeng He Shijiazhuang Tiedao University,

More information

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network De-mark GAN: Removing Dense Watermark With Generative Adversarial Network Jinlin Wu, Hailin Shi, Shu Zhang, Zhen Lei, Yang Yang, Stan Z. Li Center for Biometrics and Security Research & National Laboratory

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

Removing rain from single images via a deep detail network

Removing rain from single images via a deep detail network Removing rain from single images via a deep detail network Xueyang Fu 1 Jiabin Huang 1 Delu Zeng 2 Yue Huang 1 Xinghao Ding 1 John Paisley 3 1 Key Laboratory of Underwater Acoustic Communication and Marine

More information

Single Image Depth Estimation via Deep Learning

Single Image Depth Estimation via Deep Learning Single Image Depth Estimation via Deep Learning Wei Song Stanford University Stanford, CA Abstract The goal of the project is to apply direct supervised deep learning to the problem of monocular depth

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

arxiv: v2 [cs.cv] 14 May 2018

arxiv: v2 [cs.cv] 14 May 2018 ContextVP: Fully Context-Aware Video Prediction Wonmin Byeon 1234, Qin Wang 1, Rupesh Kumar Srivastava 3, and Petros Koumoutsakos 1 arxiv:1710.08518v2 [cs.cv] 14 May 2018 Abstract Video prediction models

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Exploring Style Transfer: Extensions to Neural Style Transfer

Exploring Style Transfer: Extensions to Neural Style Transfer Exploring Style Transfer: Extensions to Neural Style Transfer Noah Makow Stanford University nmakow@stanford.edu Pablo Hernandez Stanford University pabloh2@stanford.edu Abstract Recent work by Gatys et

More information

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks Report for Undergraduate Project - CS396A Vinayak Tantia (Roll No: 14805) Guide: Prof Gaurav Sharma CSE, IIT Kanpur, India

More information

Image Inpainting via Generative Multi-column Convolutional Neural Networks

Image Inpainting via Generative Multi-column Convolutional Neural Networks Image Inpainting via Generative Multi-column Convolutional Neural Networks Yi Wang 1 Xin Tao 1,2 Xiaojuan Qi 1 Xiaoyong Shen 2 Jiaya Jia 1,2 1 The Chinese University of Hong Kong 2 YouTu Lab, Tencent {yiwang,

More information