Variational Autoencoders. Sargur N. Srihari

Variational Autoencoders Sargur N. srihari@cedar.buffalo.edu

Topics 1. Generative Model 2. Standard Autoencoder 3. Variational autoencoders (VAE) 2

Generative Model A variational autoencoder (VAE) is a generative model i.e., able to generate fake samples that look like samples from training data With MNIST data, these fake samples would be synthetic images of handwritten digits VAE provides us with a space, the latent space, from which we can sample points Any of these points can be decoded into a reasonable image of a handwritten digit 3

Standard Autoencoder A standard autoencoder trained on MNIST digits may not provide a reasonable output when a V image is input http://ijdykeman.github.io/ml/2016/12/21/cvae.html 4

Normal Distribution of MNIST A standard normal distribution This is how we would like points corresponding to MNIST digit images to be distributed in the latent space 5

Decoder of a VAE 3 s are in first quadrant, 6 s are in third quadrant 6

Encoder of a VAE 7

MNIST Variational Autoencoder 8

Structure of Latent Space Decoder expects the latent space to be normally distributed Whether the sum of distributions produced by the encoder approximates a standard Normal distribution in measured by the KL divergence 9

VAE Training Due to random variable between input and output it cannot be trained using backprop Instead, backprop proceeds through parameters of the latent distribution Called reparameterization trick N(µ,Σ) = µ + Σ N(0, I) where the covariance matrix Σ is diagonal Due to randomness involved, training is called Stochastic Gradient Variational Bayes (SGVB) 10

Conditional VAE The number information is fed as a one-hot vector 11

Generating Images from a VAE Feed a random point in latent space and desired number. Even if the same latent point is used for two different numbers, the process will work correctly since the latent space only encodes features such as stroke width or angle 12

Samples generated from VAE Images produced by fixing no. input to decoder and sampling from latent space Nos. vary in style, but images in a single row are clearly of the same no. 13

14 VAE for Radiology Combines two types of models: discriminative and generative models into a single framework Right: generative PGM with inputs: 1. class label y (diseases) 2. nuisance variables s (hospital identifiers) 3. latent variables z (size, shape, other brain properties) Provides causality of observation Left: Discriminative deep nn model Input: observed variables Generates posterior distributions over latent variables and possibly (if unobserved) class labels. Performs Inference of latent variables necessary to perform variational updates The models are trained jointly using the variational EM framework

Variational Autoencoder (VAE) VAE is a directed model that uses Learned approximate inference Trained purely with gradient-based methods VAE generates a sample from the model, First draw sample z from code distribution p model (z). Sample is then run through a differentiable generator network g(z) x is sampled from distribution p model (x;g(z))=p model (x g(z)) However during training the approximate inference network (or encoder) q(z x) is used to obtain z and p model (x z) is viewed as a decoder network 15

16 The VAE model Method for modeling a data distribution using a collection of independent latent variables Generative model: p(x,z)=p(x z)p(z) x is a r.v. representing observed data z is a collection of latent variables p(x z) is parameterized by a deep neural network (decoder) Components of z are independent Bernoulli or Gaussian Learned approx inference trained using gradient descent q(z x)=n(μ,σ 2 I) whose parameters are given by another deep network (encoder) Thus we have z ~ Enc(x)=q(z x) and y~dec(z)=p(x z)

Key insight of VAE They can be trained by maximizing variational lower bound L(q) associated with data point x where E z~q(z x) log p model (z, x) is the joint log-likelihood of the visible and hidden variables under the approximate posterior over the latent variables H(q(z x) is the entropy of the approximate posterior When q is chosen to be Gaussian with noise added to a predicted mean, maximizing this entropy term encourages increasing σ 17

VAE : 2-D coordinate systems learned for high-dimensional manifolds 18

19 Disentangling FoVs During training, only supervision is class labels Specified FoVs Images captured from different viewpoints Strong supervision: pairs of images with two different objects at same viewing angle Unspecified FoVs Labels unavailable A disentaglement method Combine variational autoencoder with adversarial training