Unsupervised Learning

Similar documents
Alternatives to Direct Supervision

GAN Frontiers/Related Methods

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin

Generative Adversarial Networks (GANs)

Introduction to Generative Adversarial Networks

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

Variational Autoencoders. Sargur N. Srihari

Progress on Generative Adversarial Networks

Deep Learning for Visual Manipulation and Synthesis

Introduction to Generative Adversarial Networks

Implicit generative models: dual vs. primal approaches

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Lecture 19: Generative Adversarial Networks

Introduction to Generative Models (and GANs)

Generative Adversarial Networks (GANs) Ian Goodfellow, Research Scientist MLSLP Keynote, San Francisco

Towards Principled Methods for Training Generative Adversarial Networks. Martin Arjovsky & Léon Bottou

19: Inference and learning in Deep Learning

Deep Generative Models Variational Autoencoders

Amortised MAP Inference for Image Super-resolution. Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017

Generative Adversarial Networks (GANs) Based on slides from Ian Goodfellow s NIPS 2016 tutorial

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Feature Visualization

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

arxiv: v1 [cs.cv] 17 Nov 2016

arxiv: v1 [cs.cv] 7 Mar 2018

Model Generalization and the Bias-Variance Trade-Off

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Auto-Encoding Variational Bayes

Bridging Theory and Practice of GANs

Deep Generative Models and a Probabilistic Programming Library

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

arxiv: v1 [cs.ne] 11 Jun 2018

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Day 3 Lecture 1. Unsupervised Learning

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao

DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

arxiv: v2 [cs.lg] 17 Dec 2018

Perceptron: This is convolution!

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist NIPS 2016 tutorial Barcelona,

Learning to generate with adversarial networks

ECE 5424: Introduction to Machine Learning

Deep generative models of natural images

Inverting The Generator Of A Generative Adversarial Network

Stochastic Simulation with Generative Adversarial Networks

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

A Fast Learning Algorithm for Deep Belief Nets

arxiv: v1 [cs.gr] 27 Dec 2018

Auto-encoder with Adversarially Regularized Latent Variables

Dist-GAN: An Improved GAN using Distance Constraints

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Autoencoding Beyond Pixels Using a Learned Similarity Metric

Slides credited from Dr. David Silver & Hung-Yi Lee

Adversarially Learned Inference

UCLA UCLA Electronic Theses and Dissertations

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Generative Adversarial Network

Neural Networks and Deep Learning

CS 229 Midterm Review

Lecture 3 GANs and Their Applications in Image Generation

MoonRiver: Deep Neural Network in C++

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

arxiv: v1 [cs.cv] 5 Jul 2017

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Denoising Adversarial Autoencoders

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

Deep neural networks II

arxiv: v4 [cs.lg] 3 Apr 2017

CS5670: Computer Vision

Generative Models in Deep Learning. Sargur N. Srihari

Generative Adversarial Text to Image Synthesis

Semi-Supervised Generative Adversarial Nets with Multiple Generators for SAR Image Recognition

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

arxiv: v2 [stat.ml] 21 Oct 2017

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

CS230: Lecture 4 Attacking Networks with Adversarial Examples - Generative Adversarial Networks

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

10-701/15-781, Fall 2006, Final

IMPLICIT AUTOENCODERS

arxiv: v2 [cs.lg] 7 Feb 2019

CSC412: Stochastic Variational Inference. David Duvenaud

Grundlagen der Künstlichen Intelligenz

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Deep Neural Networks Optimization

arxiv: v1 [cs.lg] 24 Jan 2019

Deep Boltzmann Machines

Deep Learning Cook Book

Gradient of the lower bound

Data Set Extension with Generative Adversarial Nets

Image Restoration with Deep Generative Models

CSE 573: Artificial Intelligence Autumn 2010

Visual Recommender System with Adversarial Generator-Encoder Networks

Quantitative Evaluation of Generative Adversarial Networks and Improved Training Techniques

Deep Learning for Computer Vision

Transcription:

Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL

Timetable Niloy Iasonas Paul Vova Kostas Tobias Introduction X X X X Theory X NN Basics X X Supervised Applications X X Data X Unsupervised Applications X Beyond 2D X X Outlook X X X X X X 2

Unsupervised Learning There is no direct ground truth for the quantity of interest Autoencoders Variational Autoencoders (VAEs) Generative Adversarial Networks (GANs)

Autoencoders Goal: Meaningful features that capture the main factors of variation in the dataset These are good for classification, clustering, exploration, generation, We have no ground truth for them Input data Features Encoder Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Autoencoders Goal: Meaningful features that capture the main factors of variation Features that can be used to reconstruct the image Features (Latent variables) Input data Decoder Encoder L2 Loss function: Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Autoencoders Linear Transformation for Encoder and Decoder give result close to PCA Deeper networks give better reconstructions, since basis can be non-linear Original Autoencoder PCA Image Credit: Reducing the Dimensionality of Data with Neural Networks,. Hinton and Salakhutdinov

Example: Document Word Prob. 2D Code LSA (based on PCA) Autoencoder Image Credit: Reducing the Dimensionality of Data with Neural Networks, Hinton and Salakhutdinov

Example: Semi-Supervised Classification Many images, but few ground truth labels start unsupervised train autoencoder on many images supervised fine-tuning train classification network on labeled images Features (Latent Variables) Input data Decoder Encoder L2 Loss function: Predicted Label Features Loss function (Softmax, etc) Classifier Encoder GT Label Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Code example Autoencoder (autoencoder.ipynb) 9

Generative Models Assumption: the dataset are samples from an unknown distribution Goal: create a new sample from that is not in the dataset? Dataset Generated Image credit: Progressive Growing of GANs for Improved Quality, Stability, and Variation, Karras et al.

Generative Models Assumption: the dataset are samples from an unknown distribution Goal: create a new sample from that is not in the dataset Dataset Generated Image credit: Progressive Growing of GANs for Improved Quality, Stability, and Variation, Karras et al.

Generative Models Generator with parameters known and easy to sample from

Generative Models How to measure similarity of and? 1) Likelihood of data in Generator with parameters Variational Autoencoders (VAEs) known and easy to sample from 2) Adversarial game: Discriminator distinguishes and vs Generator makes it hard to distinguish Generative Adversarial Networks (GANs)

random Autoencoders as Generative Models? Decoder = Generator? A trained decoder transforms some features to approximate samples from What happens if we pick a random? We do not know the distribution of features that decode to likely samples Feature space / latent space Image Credit: Reducing the Dimensionality of Data with Neural Networks, Hinton and Salakhutdinov

sample Variational Autoencoders (VAEs) Generator with parameters Pick a parametric distribution for features The generator maps to an image distribution (where are parameters) Train the generator to maximize the likelihood of the data in :

sample sample Outputting a Distribution Normal distribution Bernoulli distribution Generator with parameters Generator with parameters

Variational Autoencoders (VAEs): Naïve Sampling (Monte-Carlo) SGD approximates the expected values over samples In each training iteration, sample from and randomly from the dataset, and maximize:

sample Variational Autoencoders (VAEs): Naïve Sampling (Monte-Carlo) Loss function: Generator with parameters Random from dataset In each training iteration, sample from and randomly from the dataset SGD approximates the expected values over samples

sample Variational Autoencoders (VAEs): Naïve Sampling (Monte-Carlo) Loss function: Generator with parameters with non-zero loss gradient for Random from dataset In each training iteration, sample from and randomly from the dataset SGD approximates the expected values over samples Few pairs have non-zero gradients

sample Variational Autoencoders (VAEs): The Encoder Loss function: Generator with parameters Encoder with parameters During training, another network can guess a good for a given should be much smaller than This also gives us the data point

sample Variational Autoencoders (VAEs): The Encoder Loss function: Generator with parameters Encoder with parameters Can we still easily sample a new? Need to make sure Regularize with KL-divergence approximates Negative loss can be shown to be a lower bound for the likelihood, and equivalent if

sample sample Reparameterization Trick Example when :, where Generator with parameters Backprop? Backprop Encoder with parameters Does not depend on parameters Encoder with parameters

sample sample Generating Data MNIST Frey Faces Generator with parameters Image Credit: Auto-Encoding Variational Bayes, Kingma and Welling

Demos VAE on MNIST http://dpkingma.com/sgvb_mnist_demo/demo.html VAE on Faces http://vdumoulin.github.io/morphing_faces/online_demo.html 24

Code example Variational Autoencoder (variational_autoencoder.ipynb) 25

Generative Adversarial Networks Player 1: generator Scores if discriminator can t distinguish output from real image Player 2: discriminator Scores if it can distinguish between real and fake real/fake from dataset

sample Naïve Sampling Revisited Loss function: Generator with parameters Random from dataset with non-zero loss gradient for Few pairs have non-zero gradients This is a problem of the maximum likelihood Use a different loss: Train a discriminator network to measure similarity

sample Why Adversarial? : discriminator with parameters If discriminator approximates : at maximum of has lowest loss Optimal has single mode at, small variance : generator with parameters Image Credit: How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?, Ferenc Huszár

sample Why Adversarial? For GANs, the discriminator instead approximates: : discriminator with parameters depends on the generator : generator with parameters Image Credit: How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?, Ferenc Huszár

Why Adversarial? VAEs: Maximize likelihood of data samples in GANs: Adversarial game Maximize likelihood of generator samples in approximate Image Credit: How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?, Ferenc Huszár

Why Adversarial? VAEs: Maximize likelihood of data samples in GANs: Adversarial game Maximize likelihood of generator samples in approximate Image Credit: How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?, Ferenc Huszár

sample GAN Objective probability that is not fake :discriminator fake/real classification loss (BCE): Discriminator objective: :generator Generator objective:

Non-saturating Heuristic Generator loss is negative binary cross-entropy: poor convergence Negative BCE Image Credit: NIPS 2016 Tutorial: Generative Adversarial Networks, Ian Goodfellow

Non-saturating Heuristic Generator loss is negative binary cross-entropy: poor convergence Flip target class instead of flipping the sign for generator loss: good convergence like BCE Negative BCE BCE with flipped target Image Credit: NIPS 2016 Tutorial: Generative Adversarial Networks, Ian Goodfellow

sample Generator training Discriminator training GAN Training Loss: Loss: :discriminator :discriminator :generator from dataset Interleave in each training step

DCGAN First paper to successfully use CNNs with GANs Due to using novel components (at that time) like batch norm., ReLUs, etc. Image Credit: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Radford et al.

sample InfoGAN varying :discriminator :generator maximize mutual information Image Credit: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, Chen et al.

Code example Generative Adversarial Network (gan.ipynb) 38

Conditional GANs (CGANs) learn a mapping between images from example pairs Approximate sampling from a conditional distribution Image Credit: Image-to-Image Translation with Conditional Adversarial Nets, Isola et al.

sample Generator training Discriminator training Conditional GANs Loss: Loss: :discriminator :discrim. :generator from dataset Image Credit: Image-to-Image Translation with Conditional Adversarial Nets, Isola et al.

Generator training Discriminator training Conditional GANs: Low Variation per Condition Loss: Loss: :discriminator :discrim. :generator from dataset is often omitted in favor of dropout in the generator Image Credit: Image-to-Image Translation with Conditional Adversarial Nets, Isola et al.

Demos CGAN https://affinelayer.com/pixsrv/index.html 42

CycleGANs Less supervision than CGANs: mapping between unpaired datasets Two GANs + cycle consistency Image Credit: Unpaired Image-to-Image Translation using Cycle- Consistent Adversarial Networks, Zhu et al.

not constrained to match yet CycleGAN: Two GANs Not conditional, so this alone does not constrain generator input and output to match :discriminator1 :discriminator2 :generator1 :generator2 Image Credit: Unpaired Image-to-Image Translation using Cycle- Consistent Adversarial Networks, Zhu et al.

CycleGAN: and Cycle Consistency :generator2 :generator1 L1 Loss function: L1 Loss function: :generator1 :generator2 Image Credit: Unpaired Image-to-Image Translation using Cycle- Consistent Adversarial Networks, Zhu et al.

Unstable Training GAN training can be unstable Three current research problems (may be related): Reaching a Nash equilibrium (the gradient for both and is 0) and initially don t overlap Mode Collapse

GAN Training Vector-valued loss: In each iteration, gradient descent approximately follows this vector over the parameter space :

Reaching Nash Equilibrium Gradient field example Example Nash equilib. Image Credit: GANs are Broken in More than One Way: The Numerics of GANs, Ferenc Huszár

Reaching Nash Equilibrium Solution attempt: relaxation with term: no relaxation has cycles full relaxation introduces bad Nash equilibria mixture works sometimes Image Credit: GANs are Broken in More than One Way: The Numerics of GANs, Ferenc Huszár

Generator and Data Distribution Don t Overlap Standard Instance noise: adding noise to generated and real images Wasserstein GANs: EMD as distance between and Roth et al. suggest an analytic convolution with a gaussian: Stabilizing Training of Generative Adversarial Networks through Regularization, Roth et al. 2017 Image Credit: Amortised MAP Inference for Image Superresolution, Sønderby et al.

Mode Collapse Optimal : only covers one or a few modes of 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 after n training steps Image Credit: Wasserstein GAN, Arjovsky et al. Unrolled Generative Adversarial Networks, Metz et al.

Mode Collapse Solution attempts: Minibatch comparisons: Discriminator can compare instances in a minibatch (Improved Techniques for Training GANs, Salimans et al.) Unrolled GANs: Take k steps with the discriminator in each iteration, and backpropagate through all of them to update the generator 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 Standard GAN Unrolled GAN with k=5 after n training steps after n training steps Image Credit: Wasserstein GAN, Arjovsky et al. Unrolled Generative Adversarial Networks, Metz et al.

Progressive GANs Resolution is increased progressively during training Also other tricks like using minibatch statistics and normalizing feature vectors Image Credit: Progressive Growing of GANs for Improved Quality, Stability, and Variation, Karras et al. 53

other properties other properties other properties Disentanglement Entangled: different properties may be mixed up over all dimensions Disentangled: different properties are in different dimensions specified property: number specified property: character Image Credit: Disentangling factors of variation in deep representations using adversarial training, Mathieu et al.

Summary Autoencoders Can infer useful latent representation for a dataset Bad generators VAEs Can infer a useful latent representation for a dataset Better generators due to latent space regularization Lower quality reconstructions and generated samples (usually blurry) GANs Can not find a latent representation for a given sample (no encoder) Usually better generators than VAEs Currently unstable training (active research)

Thank you! http://geometry.cs.ucl.ac.uk/dl4g/ 56