Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Size: px

Start display at page:

Download "Bidirectional Recurrent Convolutional Networks for Video Super-Resolution"

Rosamund Neal
5 years ago
Views:

1 Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academy of Sciences (CASIA) May 10, 2017

2 2 CRIPAC CRIPAC mainly focuses on the following research topics related to national public security. Biometrics Image and Video Analysis Big Data and Multi-modal Computing Content Security and Authentication Sensing and Information Acquisition CRAPIC receives regular fundings from various Government departments or agencies. It is also supported by funds of R&D projects from many other national and international sources. CRIPAC members publish widely in leading national and international journals and conferences such as IEEE Transactions on PAMI, IEEE Transactions on Image Processing, International Journal of Computer Vision, Pattern Recognition, Pattern Recognition Letters, ICCV, ECCV, CVPR, ACCV, ICPR, ICIP, etc.

3 3 NVAIL Artificial Intelligence Laboratory Researches on artificial intelligence and deep learning

4 4 Outline 1 Deep Learning 2 Recurrent Convolutional Networks 3 Application to Video Super-Resolution 4 Future Work

5 5 Outline 1 Deep Learning 2 Recurrent Convolutional Networks 3 Application to Video Super-Resolution 4 Future Work

6 6 Deep Neural Networks (DNN) Originate from: simple/complex cell, Hubel and Wiesel efficient error backpropagation, Linnainmaa deep neocognitron, convolution, Fukushima autoencoder, Ballard backpropagation for CNN, Lecun fundamental deep learning problem, Hochreiter deep recurrent neural network, Schmidhuber supervised LSTM RNN, Schmidhuber Two drawbacks: Large numbers of parameters High computational cost Small training set Over-fitting problem

7 Two Recent Developments Big Data Cheap Computation Video surveillance data size (PB) DNN can thus be fitted efficiently 7

8 Deep Learning The Resurgence of DNN Breakthrough in 2006 ImageNet: 74% vs. 85% RNN for sequence analysis Activity recognition, CVPR2015 Video caption, CVPR2015 Deep Learning promotes the fast development areas2014 of various visual computing Representation learning CNN for visual tasks DeepFace, CVPR2014 RCNN for detection, CVPR2014 8

9 9 Outline 1 Deep Learning 2 Recurrent Convolutional Networks 3 Application to Video Super-Resolution 4 Future Work

10 10 Deep Neural Networks (DNN) y x R d, h R n, W R d n h = σ xw, σ t = 1 1+e t h W x Sigmoid function σ t

11 11 Recurrent Neural Networks (RNN) y Temporal dependency modeling y h h 1 U h 2 U h 3 W W W W x x 1 x 2 x 3 DNN RNN x t R d, h t R n, W R d n, U R n n h t = σ x t W + h t 1 U

12 12 Recurrent Convolutional Networks (RCN) DNN: Deep Neural Networks RNN: Recurrent Neural Networks CNN: Convolutional Neural Networks DNN CNN Convolutional Sequential Sequential RNN RCN Convolutional

13 13 Applications of RCN Video SR, NIPS15 & TPAMI17 Scene Labeling, NIPS15 Weather Nowcasting, NIPS15 Action Recognition, ICLR15 Object Recognition, CVPR15 Person ReID, CVPR16

14 14 Outline 1 Deep Learning 2 Recurrent Convolutional Networks 3 Application to Video Super-Resolution 4 Future Work

15 Video Super-Resolution Display High-resolution devices High-resolution videos Display Super-resolution: denoising, deblurring, upscaling Low-resolution videos A great need for super resolving low-resolution videos 15

16 Two Main Approaches (1/2) 1. Single-Image super-resolution [1-6] One-to-One scheme, super resolve each video frame independently Ignore the intrinsic temporal dependency relation of video frames Low computational complexity, fast [1] Dong et al., Learning a deep convolutional network for image super resolution. ECCV, [2] Timofte et al., Anchored neighborhood regression for fast example-based super resolution. ICCV, [3] Zeyde et al., On single image scale-up using sparse-representations. Curves and Surfaces, [4] Yang et al., Image super-resolution via sparse representation. IEEE TIP, [5] Bevilacqua et al., Low-complexity single-image super resolution. BMVC, [6] Chang et al., Super-resolution through neighbor embedding. CVPR,

the temporal dependency relation by motion estimation High computational complexity, slow [7] Liu and Sun, On

, Super-resolution without explicit subpixel motion estimation. IEEE TIP, 2009. [9] Mitzel et al.

17 Two Main Approaches (2/2) 2. Multi-Frame super-resolution [7-11] Many-to-One scheme, use multiple adjacent frames to super resolve a frame Model the temporal dependency relation by motion estimation High computational complexity, slow [7] Liu and Sun, On bayesian adaptive video super resolution. IEEE PAMI, [8] Takeda et al., Super-resolution without explicit subpixel motion estimation. IEEE TIP, [9] Mitzel et al., Video super resolution using duality based tv-l 1 optical flow. PR, [10] Protter et al. Generalizing the nonlocal-means to super-resolution reconstruction. IEEE TIP, [11] Fransens et al., Optical flow based super-resolution: A probabilistic approach. CVIU,

18 Motivation RNN: Recurrent Neural Networks SR: Super-Resolution RNN can model long-term contextual information of temporal sequences well Convolutional operation can scale to full videos of any spatial size and temporal step Propose bidirectional recurrent convolutional networks, different from vanilla RNN: 1. Commonly-used full connections are replaced with weight -sharing convolutions 2. Conditional convolutions are added for learning visual-temporal dependency relation 18

19 19 Bidirectional Recurrent Convolutional Networks learn spatial dependency between a low-resolution frame and its highresolution result model long-term temporal dependency relation across video frames enhance visual-temporal dependency relation modeling

20 Learning Define an end-to-end mapping O from low-resolution frames X to high-resolution frames Y Learning proceeds by optimizing the Mean Square Error (MSE) between predicted frames O(X) and Y stochastic gradient descent L = O X Y 2 small learning rate in the output layer: 1e-4 20

21 Experiments Train the model on 25 YUV format video sequences volume-based training number of volumes: roughly 41,000 volume size: Test on a variety of real world videos severe motion blur motion aliasing complex motions Training videos Testing videos 21

22 PSNR Comparison PSNR: peak signal-to-noise ratio Table1: The results of PSNR (db) and test time (sec) on the test video sequences. Surpass state-of-the-art methods in PSNR, due to the effective [1] Video enhancer. version [4] Bevilacqua et al., Low-complexity single-image super resolution. BMVC, [5] Chang et al., Super-resolution through neighbor embedding. CVPR, [6] Dong temporal et al., Learning a dependency deep convolutional network modelling for image super resolution. ECCV, [20] Takeda et al., Super-resolution without explicit subpixel motion estimation. IEEE TIP, [22] Timofte et al., Anchored neighborhood regression for fast example-based super resolution. ICCV, [24] Yang et al., Image super-resolution via sparse representation. IEEE TIP, [25] Zeyde et al., On single image scale-up using sparse-representations. Curves and Surfaces,

23 Model Architecture Investigate the impact of our model architecture on the performance Take a simplified network containing only feedfoward (v) convolution as a benchmark Study

23 23 Model Architecture Investigate the impact of our model architecture on the performance Take a simplified network containing only feedfoward (v) convolution as a benchmark Study its variants by successively adding the bidirectional (b), recurrent (r)and conditional (t) schemes Table1: The results of PSNR (db) by variants of BRCN on the testing video sequences.

24 24 Running Time Figure: Speed vs. PSNR for all the comparison methods. Outperform both single-image and multi-frame SR methods Achieve comparable speed with the fastest single-image SR methods

25 Closeup Comparison Our method is able to recover more image details than others, Figure: Comparison among original frames (2th, 3th and 4th frames,

25 25 Closeup Comparison Our method is able to recover more image details than others, Figure: Comparison among original frames (2th, 3th and 4th frames, from the top row to the bottom) of the Dancing video and super resolved results by Bicubic, 3DSKR, ANR and BRCN, under respectively. severe motion conditions

26 Example Upscaling factor:4 87 157 348

26 26 Example Upscaling factor: Comparison: Bicubic (top) Ours (bottom)

27 Conclusion Bidirectional Recurrent Convolutional Networks bidirectional recurrent and conditional convolutions an end-to-end framework, without pre/post-processing well performance and fast speed For more details, please refer to the following papers: 1. Yan Huang, Wei Wang, and Liang Wang, Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution. Advances in Neural Information Processing Systems (NIPS), pp , Yan Huang, Wei Wang, and Liang Wang, Video Super-Resolution via Bidirectional Recurrent Convolutional Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017, Accepted 27

28 28 Outline 1 Deep Learning 2 Recurrent Convolutional Networks 3 Application to Video Super-Resolution 4 Future Work

29 Future Work For performance improvement extend our model to have a deeper architecture, e.g., based on 19 layers VGG net incorporate some effective strategies, e.g., motion ensemble and residual connection For speed acceleration replace the used pre-upsampling by learning diverse upsampling filters with deconvolution layers Others collect a large-scale high-resolution video dataset, and try to learn our model directly from raw videos 29

30 30 Acknowledgement NVAIL Artificial Intelligence Laboratory Sponsor excellent hardware resources

31 THANK YOU

Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution

Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution Yan Huang 1 Wei Wang 1 Liang Wang 1,2 1 Center for Research on Intelligent Perception and Computing National Laboratory of