Generative Adversarial Networks (GANs)

Similar documents
Generative Adversarial Networks (GANs) Based on slides from Ian Goodfellow s NIPS 2016 tutorial

Introduction to Generative Adversarial Networks

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist NIPS 2016 tutorial Barcelona,

Generative Adversarial Networks (GANs) Ian Goodfellow, Research Scientist MLSLP Keynote, San Francisco

Unsupervised Learning

Alternatives to Direct Supervision

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao

Lecture 19: Generative Adversarial Networks

Introduction to Generative Adversarial Networks

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao

GAN Frontiers/Related Methods

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

Implicit generative models: dual vs. primal approaches

Generative Adversarial Network

Deep Learning for Visual Manipulation and Synthesis

Deep Generative Models Variational Autoencoders

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Progress on Generative Adversarial Networks

Introduction to Generative Models (and GANs)

CS230: Lecture 4 Attacking Networks with Adversarial Examples - Generative Adversarial Networks

Adversarially Learned Inference

arxiv: v4 [cs.lg] 3 Apr 2017

arxiv: v1 [cs.cv] 17 Nov 2016

Bridging Theory and Practice of GANs

19: Inference and learning in Deep Learning

Towards Principled Methods for Training Generative Adversarial Networks. Martin Arjovsky & Léon Bottou

Introduction to GANs

Image Restoration with Deep Generative Models

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

Adversarial Machine Learning

Quantitative Evaluation of Generative Adversarial Networks and Improved Training Techniques

Generative Adversarial Text to Image Synthesis

Variational Autoencoders. Sargur N. Srihari

DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17

Generative Adversarial Network: a Brief Introduction. Lili Mou

CS489/698: Intro to ML

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Supplementary Material

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

Deep Generative Models and a Probabilistic Programming Library

Deep generative models of natural images

A Brief Look at Optimization

Auto-Encoding Variational Bayes

Inverting The Generator Of A Generative Adversarial Network

arxiv: v1 [cs.cv] 1 Aug 2017

Slides credited from Dr. David Silver & Hung-Yi Lee

Generative Models in Deep Learning. Sargur N. Srihari

S+U Learning through ANs - Pranjit Kalita

CSC412: Stochastic Variational Inference. David Duvenaud

arxiv: v4 [stat.ml] 4 Dec 2017

Learning to generate with adversarial networks

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Amortised MAP Inference for Image Super-resolution. Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

arxiv: v1 [cs.cv] 5 Jul 2017

arxiv: v4 [cs.lg] 1 May 2018

Deep Neural Networks Optimization

arxiv: v1 [cs.ne] 11 Jun 2018

arxiv: v1 [cs.cv] 16 Jul 2017

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

EE-559 Deep learning Non-volume preserving networks

Visual Recommender System with Adversarial Generator-Encoder Networks

Dist-GAN: An Improved GAN using Distance Constraints

arxiv: v1 [cs.cv] 7 Mar 2018

A New CGAN Technique for Constrained Topology Design Optimization. Abstract

Autoencoding Beyond Pixels Using a Learned Similarity Metric

arxiv: v2 [cs.lg] 17 Dec 2018

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

arxiv: v2 [stat.ml] 21 Oct 2017

Adversarial Examples and Adversarial Training. Ian Goodfellow, Staff Research Scientist, Google Brain CS 231n, Stanford University,

Conditional Generative Adversarial Networks for Particle Physics

Generative Adversarial Nets. Priyanka Mehta Sudhanshu Srivastava

arxiv: v1 [cs.cv] 27 Mar 2018

Day 3 Lecture 1. Unsupervised Learning

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

Data Set Extension with Generative Adversarial Nets

10703 Deep Reinforcement Learning and Control

Semi-Amortized Variational Autoencoders

Multi-Modal Generative Adversarial Networks

Lecture 3 GANs and Their Applications in Image Generation

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Natural Language Processing

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

How Learning Differs from Optimization. Sargur N. Srihari

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Tempered Adversarial Networks

Denoising Adversarial Autoencoders

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Deep Fakes using Generative Adversarial Networks (GAN)

Triple Generative Adversarial Nets

GAN Related Works. CVPR 2018 & Selective Works in ICML and NIPS. Zhifei Zhang

Transcription:

Generative Adversarial Networks (GANs) Hossein Azizpour Most of the slides are courtesy of Dr. Ian Goodfellow (Research Scientist at OpenAI) and from his presentation at NIPS 2016 tutorial Note. I am generally knowledgeable in deep learning but not particularly an expert for GANs

GANs Whatever you learned about deep learning in general applies to GANs to a higher degree the theoretical ground is not comprehensive we do not understand the inner workings very well there is a lot to learn from practice it is h a r d to train a GAN which looks successful to the human eye it fails the vast majority of the time if you plainly use the original GAN formulation the results are really cool!

Modeling Discriminative Modeling P Y X y x or f(x, y) Generative Modeling P X Y x y and P X (x)

Generative Modeling Density estimation P X x, P X Y x y, P X Z x z P Z Y (z y) z Sample generation x~p X Z x z Training examples Model samples

Content Why study generative modeling? How do generative models work? How do GANs compare to others? How do GANs work? Tips and tricks Research frontiers

Why study generative models? Theoretical Reason: Excellent test of our ability to use highdimensional, complicated probability distributions Practical Reasons: Simulate possible futures for planning or simulated RL Missing data Semi-supervised learning Multi-modal outputs Realistic generation tasks

Next Video Frame Prediction (Lotter et al 2016)

Single Image Super- Resolution (Ledig et al 2016)

igan youtube Zhu et al igan: Generative Visual Manipulation on the Natural Image Manifold (2016) collaboration between Adobe and Berkeley

Introspective Adversarial Networks (Brock et al 2016) youtube

Image to Image Translation (Isola et alimage-to-image Translation with Conditional Adversarial Networks 2016)

Content Why study generative modeling? How do generative models work? How do GANs compare to others? How do GANs work? Tips and tricks Research frontiers Combining GANs with other methods

Loss functions Two approaches: Increase log-likelihood of data (ML) Have your network learn it s loss function! (Adversarial Learning)

Maximum Likelihood θ = argmax θ E x~pdata log P model (x; θ)

Variational Autoencoder (Kingma and Welling 2013, Rezende et al 2014) log P x log P x D KL (Q(z) P z x E x~q = P x, z + H(Q) Disadvantages: -Not asymptotically consistent unless q is perfect -Samples tend to have lower quality CIFAR-10 samples (Kingma et al 2016)

Adversarial Learning The output distribution of your desired input-output mapping function is non-trivial e.g. (classconditional) image manifold You have access to samples of the output distribution Then you can learn your loss function to say how much it is possible to tell apart the samples of the output distribution from the samples of your mapping function

GANs Use a latent code Are unstable to train Often regarded as producing the best samples No good way to quantify this

Content Why study generative modeling? How do generative models work? How do GANs compare to others? How do GANs work? Tips and tricks Research frontiers

Adversarial Nets Framework

Generator Network x = G(z; θ G ) - Must be differentiable - No invertibility requirement - Trainable for any size of z - Can make x conditionally Gaussian given z but need not do so

Training Procedure Use SGD-like algorithm of choice (Adam) on two minibatches simultaneously: A minibatch of training examples A minibatch of generated samples Optional: run k steps of one player for every step of the other player.

Minimax Game J (D) = 1 2 E x~p data log D x 1 2 E x log(1 D G z ) J (G) = J (D) Equilibrium is a saddle point of the discriminator loss Generator minimizes the log-probability of the discriminator being correct

Discriminator Strategy D x = P data (x) P data x + P model (x) Discriminator Data distribution Model distribution Estimating this ratio using supervised learning is the key approximation mechanism used by GANs x z

Non-Saturating Game J (D) = 1 2 E x~p data log D x 1 2 E x log(1 D G z ) J (G) = 1 2 E z log D G z Equilibrium no longer describable with a single loss Generator maximizes the log-probability of the discriminator being mistaken Heuristically motivated; generator can still learn even when discriminator successfully rejects all generator samples

Non-Saturating Game How does it work in practice?

DCGAN Architecture Most deconvs are batch normalized (Radford et al 2015)

DCGANs for LSUN Bedrooms (Radford et al 2015)

Vector Space Arithmetic Man with glasses Man Woman Woman with Glasses (Radford et al, 2015)

Latent Space Interpolation

Content Why study generative modeling? How do generative models work? How do GANs compare to others? How do GANs work? Tips and tricks Research frontiers

Labels improve subjective sample quality Learning a conditional model p(x y) often gives much better samples from all classes than learning p(x) does (Denton et al 2015) Even just learning p(x,y) makes samples from p(x) look much better to a human observer (Salimans et al 2016) Note: this defines three categories of models (no labels, trained with labels, generating condition on labels) that should not be compared directly to each other

One-sided label smoothing Default discriminator cost: cross_entropy(1., discriminator(data)) + cross_entropy(0., discriminator(samples)) One-sided label smoothed cost (Salimans et al 2016): cross_entropy(.9, discriminator(data)) + cross_entropy(0., discriminator(samples))

Do not smooth negative labels cross_entropy(1.-alpha, discriminator(data)) + cross_entropy(beta, discriminator(samples)) Reinforces current generator behavior

Benefits of label smoothing Good regularizer (Szegedy et al 2015) Does not reduce classification accuracy, only confidence Benefits specific to GANs: Prevents discriminator from giving very large gradient signal to generator Prevents extrapolating to encourage extreme samples

Batch Norm Given inputs X={x (1), x (2),.., x (m) } Compute mean and standard deviation of features of X Normalize features (subtract mean, divide by standard deviation) Normalization operation is part of the graph Backpropagation computes the gradient through the normalization This avoids wasting time repeatedly learning to undo the normalization

Batch norm in G can cause strong intra-batch correlation

Reference Batch Norm Fix a reference batch R={r (1), r (2),.., r (m) } Given new inputs X={x (1), x (2),.., x (m) } Compute mean and standard deviation of features of R Note that though R does not change, the feature values change when the parameters change Normalize the features of X using the mean and standard deviation from R Every x (i) is always treated the same, regardless of which other examples appear in the minibatch

Balancing G and D Usually the discriminator wins This is a good thing the theoretical justifications are based on assuming D is perfect Usually D is bigger and deeper than G Sometimes run D more often than G. Mixed results. Do not try to limit D to avoid making it too smart Use non-saturating cost Use label smoothing

Tips and Tricks There are many heuristics listed on torch GAN page, collected by Soumith Chintala

Content Why study generative modeling? How do generative models work? How do GANs compare to others? How do GANs work? Tips and tricks Research frontiers Combining GANs with other methods

Non-convergence Optimization algorithms often approach a saddle point or local minimum rather than a global minimum Game solving algorithms may not approach an equilibrium at all

Non-convergence in GANs Exploiting convexity in function space, GAN training is theoretically guaranteed to converge if we can modify the density functions directly, but: Instead, we modify G (sample generation function) and D (density ratio), not densities We represent G and D as highly non-convex parametric functions Oscillation : can train for a very long time, generating very many different categories of samples, without clearly generating better samples Mode collapse: most severe form of non-convergence

Mode Collapse D in inner loop: convergence to correct distribution G in inner loop: place all mass on most likely point (Metz et al 2016)

Mode Collapse GANs often seem to collapse to far fewer modes than the model can represent

Mode collapse causes low output diversity multiple parallel generators share parameters up to layer l applies a diversity loss on different generators Alternatively, have D predict which generator the fake sample came from! Ghosh et al. Multi-Agent Diverse Generative Adversarial Networks (2017)

Minibatch Features Discriminator classifies a minibatch of either fake or true examples as opposed to an isolated one Add minibatch features that classify each example by comparing it to other members of the minibatch (Salimans et al 2016) Nearest-neighbor style features detect if a minibatch contains samples that are too similar to each other (coming from a single mode)

Minibatch GAN on CIFAR Training Data Samples (Salimans et al 2016)

Minibatch GAN on ImageNet (Salimans et al 2016)

Cherry-Picked Results

Problems with Counting

Problems with Perspective

Problems with Global Structure

This one is real

Unrolled GANs Backprop through k updates of the discriminator to prevent mode collapse: Metz et al UNROLLED GENERATIVE ADVERSARIAL NETWORKS (2016)

Wasserstein GAN Attacks two problems: Stabilizing the training Convergence criteria No log in the loss. The output of DD is no longer a probability, hence we do not apply sigmoid at the output of DD Clip the weight of DD Train DD more than GG Use RMSProp instead of ADAM Lower learning rate, the paper uses α=0.00005

Wasserstein GAN

Evaluation There is not any single compelling way to evaluate a generative model Models with good likelihood can produce bad samples Models with good samples can have bad likelihood There is not a good way to quantify how good samples are For GANs, it is also hard to even estimate the likelihood See A note on the evaluation of generative models, Theis et al 2015, for a good overview

Discrete outputs G must be differentiable Cannot be differentiable if output is discrete Possible workarounds: REINFORCE (Williams 1992) Concrete distribution (Maddison et al 2016) or Gumbelsoftmax (Jang et al 2016) Learn distribution over continuous embeddings, decode to discrete

Auxiliary Information One trick to stabilize the training is to include auxiliary information either in the input space (current frame for future frame prediction) or adding it to the output space (different classes of true examples)

Supervised Discriminator Real Fake Real cat Real dog Fake Hidden units Hidden units Input Input (Odena 2016, Salimans et al 2016)

Learning interpretable latent codes / controlling the generation process InfoGAN (Chen et al 2016)

The coolest GAN ever!

Plug and Play Generative Models New state of the art generative model (Nguyen et al 2016) Generates 227x227 realistic images from all ImageNet classes Combines adversarial training, perceptual loss, autoencoders, and Langevin sampling Nguyan et al. Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space (2016)

PPGN Samples (Nguyen et al 2016)

PPGN Samples (Nguyen et al 2016)

PPGN for caption to image (Nguyen et al 2016)

igan youtube Zhu et al igan: Generative Visual Manipulation on the Natural Image Manifold (2016) collaboration between Adobe and Berkeley

Summary GANs are generative models that use supervised learning to approximate an intractable cost function GANs can simulate many cost functions, including the one used for maximum likelihood GANs are, at the moment, unstable to train and need many tricks to converge. Reaching Nash equilibrium is an important open research question.

Questions Technical Report Ian Goodfellow s technical report on GANs is a very easy and informative read: https://arxiv.org/pdf/1701.00160.pdf