Pixel-level Generative Model

Similar documents
Amortised MAP Inference for Image Super-resolution. Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017

Outline GF-RNN ReNet. Outline

Bc. David Honzátko. Generative Neural Networks in Image Reconstruction

PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Video Pixel Networks. Abstract. 1. Introduction

RNNs as Directed Graphical Models

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

27: Hybrid Graphical Models and Neural Networks

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning 13. week

Neural Machine Translation In Linear Time

Deep Generative Models Variational Autoencoders

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

arxiv: v2 [cs.cv] 14 May 2018

Deep generative models of natural images

arxiv: v3 [cs.lg] 30 Dec 2016

arxiv: v1 [cs.lg] 8 Dec 2016 Abstract

Capsule Networks. Eric Mintun

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 15

Deep Image Inpainting

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

Image Captioning with Object Detection and Localization

Qumran Letter Restoration by Rotation and Reflection Modified PixelCNN

Recurrent Convolutional Neural Networks for Scene Labeling

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

MoonRiver: Deep Neural Network in C++

Generative Adversarial Text to Image Synthesis

Deep Learning with Tensorflow AlexNet

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Controllable Generative Adversarial Network

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

ImageNet Classification with Deep Convolutional Neural Networks

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Deep Learning for Computer Vision II

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

Machine Learning. MGS Lecture 3: Deep Learning

Video Compression Using Recurrent Convolutional Neural Networks

A Wavenet for Speech Denoising

Deep Learning for Visual Manipulation and Synthesis

END-TO-END CHINESE TEXT RECOGNITION

DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Bayesian model ensembling using meta-trained recurrent neural networks

An Abstract Domain for Certifying Neural Networks. Department of Computer Science

Auxiliary Guided Autoregressive Variational Autoencoders

Single Image Depth Estimation via Deep Learning

DEEP LEARNING OF COMPRESSED SENSING OPERATORS WITH STRUCTURAL SIMILARITY (SSIM) LOSS

Know your data - many types of networks

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

Deep Learning and Its Applications

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Modeling Sequences Conditioned on Context with RNNs

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Semantic Segmentation. Zhongang Qi

Implicit Mixtures of Restricted Boltzmann Machines

Generative Adversarial Networks (GANs) Ian Goodfellow, Research Scientist MLSLP Keynote, San Francisco

Neural Networks and Deep Learning

arxiv: v2 [cs.cv] 24 Jul 2017

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Variational Autoencoders. Sargur N. Srihari

Conditional Random Fields as Recurrent Neural Networks

19: Inference and learning in Deep Learning

Recurrent Neural Networks and Transfer Learning for Action Recognition

CS489/698: Intro to ML

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Dynamic Routing Between Capsules

Graphical Models, Bayesian Method, Sampling, and Variational Inference

Photo-realistic Renderings for Machines Seong-heum Kim

Recurrent Neural Network (RNN) Industrial AI Lab.

arxiv: v2 [cs.ne] 10 Nov 2018

PredCNN: Predictive Learning with Cascade Convolutions

Convolutional Neural Networks

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Towards Conceptual Compression

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

arxiv: v1 [cs.cv] 14 Dec 2016

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

CS231N Section. Video Understanding 6/1/2018

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

ContextVP: Fully Context-Aware Video Prediction

What Do We Understand About Convolutional Networks?

Computer Vision I - Filtering and Feature detection

arxiv: v1 [cs.cv] 17 Nov 2016

arxiv: v1 [stat.ml] 10 Dec 2018

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Layerwise Interweaving Convolutional LSTM

Convolutional Neural Networks

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

Neural style transfer

CSC 578 Neural Networks and Deep Learning

PixelCNN Models with Auxiliary Variables for Natural Image Modeling

Transcription:

Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord, N. Kalchbrenner and K. Kavukcuoglu Google DeepMind, UK Conditional Image Generation with PixelCNN Decoders (2016NIPS) A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves and K. Kavukcuoglu Google DeepMind, UK Presented by Qi WEI June 23rd, 2017 1

Open problem How to generate a good image? Target: model the distribution of natural images Expectation: expressive, tractable and scalable Difficulty: strong statistical dependencies over hundreds of pixels Strategy of pixel-level generation: Explicit density model Use chain rule to decompose likelihood of an image x into product of 1-d distributions p(x; θ) }{{} likelihood of image x = i,j p(x i,j x <ij ; θ) }{{} conditional on all previous pixels (1) maximize likelihood of training data 2

PixelRNN Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) 3

PixelRNN Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) 4

PixelRNN Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) 5

PixelRNN Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) 6

PixelCNN Also generate image pixels starting from corner Dependency on previous pixels modeled using a CNN Comparison with PixelRNN: Training PixelCNN is faster than PixelRNN Generation still proceed sequentially: still slow 7

1 Spatial LSTMs 2 PixelRNN 3 PixelCNN 8

Spatial LSTMs Spatial LSTMs A recurrent image model based on multi-dimensional LSTM Target: handle long-range dependencies that are central to object and scene understanding Advantage: scale to images of arbitrary size likelihood is computationally tractable Model Structure: MCGSM: mixtures of conditional Gaussian scale mixtures Spatial LSTM: a special case of the multidimensional LSTM (Graves & Schmidhuber, 2009) RIDE: Recurrent image density estimator, i.e., MCGSM + SLSTM 9

Spatial LSTMs 10 Recurrent model of natural images MCGSM+RIDE Figure: A: pixel dependency. B: causal neighborhood. C: a visualization of the proposed recurrent image model with two layers of spatial LSTMs.

Spatial LSTMs 11 MCGSMs: mixtures of conditional Gaussian scale mixtures Distribution of any parametric model with parameters θ: p(x; θ) = i,j p(x i,j x <ij ; θ) (2) Improve the representational power: p(x; {θ ij }) = i,j p(x i,j x <ij ; θ ij ) (3) The conditional distribution in an MCGSM: p(x ij x <ij, θ ij ) = c,s p(c, s x <ij, θ ij ) p(x i,j x <ij, c, s, θ ij ) (4) }{{}}{{} gate expert

Spatial LSTMs 12 MCGSMs: mixtures of conditional Gaussian scale mixtures The conditional distribution in an MCGSM: p(x ij x <ij, θ ij ) = c,s p(c, s x <ij, θ ij ) p(x i,j x <ij, c, s, θ ij ) (5) }{{}}{{} gate expert p(c, s x <ij, θ ij ) exp(η cs 1 2 eαcs x T <ijk c x <ij ) p(x i,j x <ij, c, s) = N (x ij ; a T c x <ij, e αcs ) where K c is positive definite. To reduce the number of parameters: (6) K c n β 2 cnb n b T n (7)

Spatial LSTMs Spatial LSTM Operations: where c ij = g ij i ij + c i,j 1 f c ij + c i 1,j f r ij h ij = o ij tanh(c ij ) g ij o ij i ij f r ij f c ij = tanh σ σ σ σ T A,b x <ij h i,j 1 h i 1,j (8) (9) where σ is the logistic sigmoid function indicates a pointwise product T A,b is an affine transformation which depends on the only parameters of the network A and b i ij, o ij, f c ij, f r ij are gating units 13

Spatial LSTMs 14 RIDE: recurrent image density estimator Factorized MCGSM: p(x ij x <ij ) = p(x ij h ij ) (10) the state of the hidden vector only depends on pixels in x <ij and does not violate the factorization law allows this RIDE to use pixels of a much larger region for prediction nonlinearly transform the pixels before applying the MCGSM increase the representational power of the model by stacking spatial LSTMs

Spatial LSTMs Experimental Results Natural Images Figure: Average log-likelihoods and log-likelihood rates for image patches (w.o./w. DC) and large images extracted from BSDS300 [25]. Figure: Average log-likelihood rates for image patches and large images extracted from van Hateren s dataset [48]. 15

Spatial LSTMs Experimental Results (cont.) Dead leaves Figure: From top to bottom: A 256 by 256 pixel crop of a texture [2], a sample generated by an MCGSM trained on the full texture [7], and a sample generated by RIDE (highlight D104, D34). 16

Spatial LSTMs 17 Experimental Results (cont.) Texture synthesis and inpainting Figure: The center portion of a texture (left and center) was reconstructed by sampling from the posterior distribution of RIDE (right).

Spatial LSTMs 18 Summary Pros Cons RIDE: a deep but tractable recurrent image model based on spatial LSTMs Superior performance in quantitative comparisons Able to capture many different statistical patterns Only works on grayscale images

PixelRNN 19 PixelRNN PixelRNN: a DNN sequentially predicting the pixels in an image along the two spatial dimensions, similarly to RIDE with Novelty: fast two dimensional recurrent layers residual connections in RNN handle RGB images Advantages: scale to images of arbitrary size get better log-likelihood scores on natural images generate more crisp, varied and globally coherent samples

PixelRNN 20 Model structure For one band image: p(x) = p(x i x 1,, x i 1 ) (11) n 2 i=1 For RGB images: p(x i x <i ) = p(x i,r x <i )p(x i,g x <i, x i,r )p(x i,b x <i, x i,r, x i,g ) (12) Pixels as Discrete Variables: Each channel variable x i, simply takes one of 256 distinct values.

PixelRNN 21 Model structure (cont.) Row LSTM & Diagonal BiLSTM

PixelRNN 22 Model structure (cont.) Residual Connections Figure: Left: PixelCNN Right: PixelRNN

PixelRNN 23 Experiment Results Residual Connections MNIST

PixelRNN 24 Experiment Results MNIST

PixelRNN 25 Experiment Results Generate Images (32 32) (a) CIFAR-10 (b) ImageNet

PixelRNN Experiment Results ImageNet (64 64) (c) normal model (d) multi-scale model 26

PixelRNN 27 Experiment Results Image Completions (32 32)

PixelCNN 28 PixelCNN Target: generating images conditional on any vector, e.g., labels/tags/embeddings Advantages: generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures serve as a powerful decoder improve the log-likelihood of PixelRNN on ImageNet much faster to train than PixelRNN

PixelCNN 29 Visualization of PixelCNN Figure: Left: visualization of PixelCNN. Middle: masked convolution filter. Right: blind spot in the receptive field.

PixelCNN 30 Gated Convolutional Layers Joint distribution of pixels over an image: p(x) = p(x i x 1,, x i 1 ) (13) n 2 i=1 Replace RELUs between masked convolutions with the gated activation unit: y = tanh(w k,f x) σ(w k,g x) (14)

PixelCNN 31 Conditional PixelCNN Distribution of conditional PixelCNN: h is location-independent p(x h) = p(x i x 1,, x i 1, h) (15) n 2 i=1 y = tanh(w k,f x + Vk,f T h) σ(w k,g x + Vk,g T h) (16) h is location-dependent y = tanh(w k,f x + V k,f s) σ(w k,g x + V k,g s) (17) where s = m(h) is a spatial representation with a deconvolutional neural network m().

PixelCNN 32 Gated PixelCNN Figure: A single Layer in the Gated PixelCNN architecture.

PixelCNN 33 Experiment Results Unconditional Modeling with Gated PixelCNN (a) CIFAR-10 (b) ImageNet

PixelCNN Experiment Results (Cont ) Conditioning on ImageNet Classes 34

PixelCNN Experiment Results (Cont ) Conditioning on Portrait Embeddings 35

PixelCNN 36 Experiment Results (Cont ) PixelCNN Auto Encoder

PixelCNN 37 Summary Pros Cons provides a way to calculate likelihood training more stable than GANs works for both discrete and continuous data faster than PixelRNN assumes the order of generation: top to down, left to right slow in sampling as in PixelRNN slow in training (though faster than PixelRNN): PixelCNN++ (from OpenAI) converges in 5 days on 8 Titan for CIFAR dataset.

PixelCNN 38 Take-home Message Both PixelRNN and PixelCNN can Quantitative: explicitly compute likelihood p(x) which is high Qualitative: generate visually good image samples and PixelRNN: slow to train and slow to generate PixelCNN: fast to train and slow to generate Other interesting works: PixelVAE, PixelCNN++, PixelGAN,...