Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

Similar documents
Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Supplementary Material

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform

CNN for Low Level Image Processing. Huanjing Yue

Efficient Module Based Single Image Super Resolution for Multiple Problems

DCGANs for image super-resolution, denoising and debluring

Image Super-Resolution Using Dense Skip Connections

Deep Back-Projection Networks For Super-Resolution Supplementary Material

Fast and Accurate Image Super-Resolution Using A Combined Loss

arxiv: v1 [cs.cv] 10 Apr 2018

arxiv: v1 [cs.cv] 18 Dec 2018 Abstract

EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis Supplementary

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Densely Connected High Order Residual Network for Single Frame Image Super Resolution

arxiv: v1 [cs.cv] 5 Jul 2017

Balanced Two-Stage Residual Networks for Image Super-Resolution

arxiv: v2 [cs.cv] 4 Nov 2018

Learning a Single Convolutional Super-Resolution Network for Multiple Degradations

arxiv: v2 [cs.cv] 17 Sep 2018

arxiv: v2 [cs.cv] 16 Apr 2018

arxiv: v1 [cs.cv] 13 Sep 2018

Single Image Super Resolution of Textures via CNNs. Andrew Palmer

Progress on Generative Adversarial Networks

arxiv: v2 [cs.cv] 19 Apr 2019

arxiv: v1 [cs.cv] 25 Dec 2017

DENSE BYNET: RESIDUAL DENSE NETWORK FOR IMAGE SUPER RESOLUTION. Bjo rn Stenger2

arxiv: v1 [cs.cv] 20 Jul 2018

Introduction. Prior work BYNET: IMAGE SUPER RESOLUTION WITH A BYPASS CONNECTION NETWORK. Bjo rn Stenger. Rakuten Institute of Technology

arxiv: v2 [cs.cv] 24 May 2018

Image Restoration: From Sparse and Low-rank Priors to Deep Priors

FAST: A Framework to Accelerate Super- Resolution Processing on Compressed Videos

SINGLE image super-resolution (SR) aims to reconstruct

FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors

Physics-Based Generative Adversarial Models for Image Restoration and Beyond

IEGAN: Multi-purpose Perceptual Quality Image Enhancement Using Generative Adversarial Network

Super-Resolution on Image and Video

SINGLE image super-resolution (SR) aims to reconstruct

Fast and Accurate Single Image Super-Resolution via Information Distillation Network

An Attention-Based Approach for Single Image Super Resolution

arxiv: v1 [cs.lg] 21 Dec 2018

arxiv:submit/ [cs.cv] 27 Mar 2018

Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation

FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors

Accelerated very deep denoising convolutional neural network for image super-resolution NTIRE2017 factsheet

Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

Example-Based Image Super-Resolution Techniques

arxiv: v1 [cs.cv] 31 Dec 2018 Abstract

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

IRGUN : Improved Residue based Gradual Up-Scaling Network for Single Image Super Resolution

A Novel Multi-Frame Color Images Super-Resolution Framework based on Deep Convolutional Neural Network. Zhe Li, Shu Li, Jianmin Wang and Hongyang Wang

Image Super Resolution Based on Fusing Multiple Convolution Neural Networks

arxiv: v1 [cs.cv] 15 Sep 2016

Task-Aware Image Downscaling

OPTICAL Character Recognition systems aim at converting

arxiv: v2 [cs.cv] 19 Sep 2016

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

RTSR: Enhancing Real-time H.264 Video Streaming using Deep Learning based Video Super Resolution Spring 2017 CS570 Project Presentation June 8, 2017

Feature Super-Resolution: Make Machine See More Clearly

Controllable Generative Adversarial Network

Residual Dense Network for Image Super-Resolution

arxiv: v1 [cs.cv] 30 Nov 2018

An Effective Single-Image Super-Resolution Model Using Squeeze-and-Excitation Networks

arxiv: v1 [cs.cv] 7 Mar 2018

Learning to Super-Resolve Blurry Face and Text Images

Attribute Augmented Convolutional Neural Network for Face Hallucination

CT Image Denoising with Perceptive Deep Neural Networks

SINGLE image restoration (IR) aims to generate a visually

arxiv: v1 [cs.cv] 16 Dec 2018

Fast and Accurate Single Image Super-Resolution via Information Distillation Network

CARN: Convolutional Anchored Regression Network for Fast and Accurate Single Image Super-Resolution

Image Restoration with Deep Generative Models

Video Frame Interpolation Using Recurrent Convolutional Layers

arxiv: v3 [cs.cv] 6 Dec 2017

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

Single Image Super-resolution. Slides from Libin Geoffrey Sun and James Hays

LOW-RESOLUTION and noisy images are always. Simultaneously Color-Depth Super-Resolution with Conditional Generative Adversarial Network

Super-Resolving Very Low-Resolution Face Images with Supplementary Attributes

Blind Image Deblurring Using Dark Channel Prior

Multi-Input Cardiac Image Super-Resolution using Convolutional Neural Networks

Hallucinating Very Low-Resolution Unaligned and Noisy Face Images by Transformative Discriminative Autoencoders

GENERATIVE ADVERSARIAL NETWORK-BASED VIR-

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

arxiv: v4 [cs.cv] 25 Mar 2018

arxiv: v1 [cs.cv] 8 Feb 2018

arxiv: v1 [cs.cv] 10 Jul 2017

arxiv: v5 [cs.cv] 4 Oct 2018

arxiv: v1 [cs.cv] 7 Aug 2017

Face Super-resolution Guided by Facial Component Heatmaps

Multi-scale Residual Network for Image Super-Resolution

UDNet: Up-Down Network for Compact and Efficient Feature Representation in Image Super-Resolution

arxiv: v1 [cs.cv] 23 Mar 2018

Connecting Image Denoising and High-Level Vision Tasks via Deep Learning

Fast Guided Global Interpolation for Depth and. Yu Li, Dongbo Min, Minh N. Do, Jiangbo Lu

Deep Learning for Visual Manipulation and Synthesis

arxiv: v2 [cs.cv] 2 Dec 2017

Deep Generative Adversarial Compression Artifact Removal

Learning Warped Guidance for Blind Face Restoration

arxiv: v1 [cs.cv] 19 Oct 2017

GAN-D: Generative Adversarial Networks for Image Deconvolution

Learning Data Terms for Non-blind Deblurring Supplemental Material

Transcription:

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Xintao Wang Ke Yu Chao Dong Chen Change Loy

Problem enlarge 4 times Low-resolution image High-resolution image

Previous work Contemporary SR algorithms are mostly CNN-based methods [1]. Most of CNN-based methods use pixel-wise loss function. (MSE-based model) good at recovering edges and smooth areas not good at texture recovery Adversarial loss is introduced in SRGAN [2] and EnhanceNet [3]. (GAN-based model) encourage the network to favor solutions that look more like natural images visual quality of reconstruction is significantly improved SRCNN SRGAN Ground-truth [1] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In ECCV, 2014. [2] C. Ledig, L. Theis, F. Husz ar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, et al. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017. [3] M. S. Sajjadi, B. Sch olkopf, and M. Hirsch. EnhanceNet: Single image super-resolution through automated texture synthesis. In ICCV, 2017.

Motivation building x4 plant x4 building plant swap s

Semantic categorical building water animal sky grass plant mountain

Issues 1. How to represent the semantic categorical? Our approach: explore semantic segmentation probability maps as the categorical up to pixel level. 2. How categorical can be incorporated into the reconstruction process effectively? Our approach: propose a novel Spatial Feature Transform that is capable of altering the network behavior conditioned on other information.

Represent categorical Contemporary CNN segmentation network [1] fine-tuned on LR images K categories ResNet 101 argmax probability maps semantic categorical [1] Z. Liu, X. Li, P. Luo, C.-C. Loy, and X. Tang. Semantic image segmentation via deep parsing network. In ICCV, 2015.

Examples on segmentation Input LR images Ground-truth Segments on HR images Segments on LR images sky grass building mountain plant water animal background

Incorporate conditions Categorical Ψ = (P 1, P 2,, P K ) Ψ probability maps P 1, P 2,, P K? y = G θ (x Ψ) CNN for SR y = G θ (x) input LR image x net G parametrized by θ restored image y

Spatial Feature Transform By learning a mapping function M, the Ψ is modeled by a pair of affine transformation parameters (γ, β). M: Ψ (γ, β) The modulation is then carried out by an affine transformation on feature maps F. SFT F γ, β = γ F+ β y = G θ (x Ψ) M: Ψ (γ, β) SFT F γ, β = γ F+ β y = G θ (x γ, β)

Conv Conv Conv Conv Conv Conv Conv Conv Conv SFT layer Conv SFT layer Conv Residual block Residual block SFT layer Conv Upsampling Conv Conv Conv Spatial Feature Transform Residual block SFT layer Segmentation probability maps Condition Network conditions Shared SFT conditions features + γ i β i

loss function Generator Adversarial loss [1] encourage the network to generate images that reside on the manifold of natural images min θ max η Ε y~phr logd η y + Ε x~plr log(1 D η G θ (x) ) Compete Discriminator Perceptual loss [2] use a pre-trained 19-layer VGG network (features before conv54) optimize a super-resolution model in a feature space φ VGG y φ VGG y 2 2 [1] Goodfellow, Ian, et al. Generative adversarial nets. In NIPS. 2014. [2] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV, 2016.

Spatial condition The modulation parameters (γ, β) have a close relationship with probability maps P and contain spatial information. Input P building map P grass map γ map of C 6 β map of C 7 LR patch Restored

Delicate modulation LR patch P plant map γ map of C 51 β map of C 1 Restored P grass map γ map of C 14 β map of C 5

Results SRCNN SRGAN EnhanceNet SFT-Net (ours) GT PSNR: 24.83dB PSNR: 23.36dB PSNR: 22.71dB PSNR: 22.90dB

Results Bicubic SRCNN VDSR LapSRN DRRN MemNet EnhanceNet SRGAN SFT-Net (ours) GT MSE-based method GAN-based method

User study part I Ours 85 15 EnhanceNet Ours 67 33 SRGAN 54.5 76.4 68 75 56.4 68.7 65.7 sky building grass animal plant water mountain

User study part II GT Ours 18.6 79.6 80.4 18.4 MemNet 61.3 36.3 SRCNN 37 62.4 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Rank-1 Rank-2 Rank-3 Rank-4

animal plant water mountain grass sky building Impact of different s bicubic building sky grass mountain water plant animal building sky grass mountain water plant animal building

animal plant water mountain grass sky building Impact of different s bicubic building sky grass mountain water plant animal mountain bicubic building sky grass mountain water plant animal

Other conditioning methods Input concatenation Compositional mapping [1] FiLM [2] [1] S. Zhu, S. Fidler, R. Urtasun, D. Lin, and C. C. Loy. Be your own prada: Fashion synthesis with structural coherence. In ICCV, 2017. [2] E. Perez, F. Strub, H. de Vries, V. Dumoulin, and A. Courville. FiLM: Visual reasoning with a general conditioning layer. arxiv preprint arxiv:1709.07871, 2017.

Comparison with other conditioning methods SFT-Net (ours) Input concatenation Compositional mapping FiLM

Robustness to out-of-category SRGAN Ours SRGAN Ours

Conclusion Explore semantic segmentation maps as categorical for realistic texture recovery. Propose a novel Spatial Feature Transform layer to efficiently incorporate the categorical conditions into a CNN-based SR network. Extensive comparisons and a user study demonstrate the capability of SFT-Net in generating realistic and visually pleasing textures.

Crafting a Toolchain for Image Restoration by Deep Reinforcement Learning Ke Yu Chao Dong Liang Lin Chen Change Loy

Image Restoration There are many individual tasks Denoising Deblurring JPEG Deblocking Super-Resolution Towards more complicated distortions Address multiple levels of degradation in one task Address multiple individual tasks [3] [1, 2]

Image Restoration A New Setting Consider multiple distortions simultaneously Real-world: Image capture and storage Synthetic: Gaussian blur, Gaussian noise and JPEG compression Real-world Scenario Gaussian Blur Gaussian Noise JPEG Compression Synthetic Setting Our New Task

Motivation Can we use a single CNN to address multiple distortions? Inefficient: Require a huge network to handle all the possibilities Inflexible: All kinds of distorted images are processed with the same structure Find a more efficient and flexible approach! Process different distortion in a different way

Method Decision Making Progressively restore the image quality Treat image restoration as a decision making process Artifacts! Blurry! Noisy! Try a Good enough :) deblocking deblurring denoising tool

Method Overview Our framework requires a toolbox and an agent Agent Toolbox Agent Toolbox

Method Toolbox We design 12 tools, each of which addresses a simple task 3-layer CNN [4] 8-layer CNN

Method Agent Use reinforcement learning to address tool selection current distorted image action at last step 12 tools stopping State Action Reward: PSNR gain at each step Structure : I 1 Input Image v 1 S 1 Feature Extractor One-hot Encoder Agent LSTM v 1

Method Joint Training Challenge of Middle State Intermediate results after several steps of processing None of the tools has seen these intermediate results Joint Training forward backward toolchain 1 forward toolchain 2 backward MSE loss......... MSE loss

Experimental Results Dataset: DIV2K [5] Comparison with generic models for image restoration VDSR [1] DnCNN [3]

Experimental Results Quantitative results on DIV2K Competitive performance Better generality Runtime Analyses More efficient

Experimental Results Qualitative results on DIV2K Mild (unseen) Moderate Severe (unseen) Input 1 st step 2 nd step 3 rd step VDSR-s VDSR [1]

Experimental Results Qualitative results on real-world images Input 1 st step 2 nd step 3 rd step VDSR [1]

Experimental Results Ablation Study Joint training Stopping action

Conclusion Contributions Address image restoration in a reinforcement learning framework Propose joint learning to cope with middle processing state Dynamically formed toolchain performs competitively against human-designed networks with less computational complexity Future work Incorporate more tools (trained with GAN loss) Handle spatial-variant distortions

Thanks! Q & A

Reference [1] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR, 2016. [2] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In ICCV, 2017. [3] K. Zhang,W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. TIP, 2017. [4] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. TPAMI, 38(2):295 307, 2016. [5] E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In CVPR Workshop, 2017.