Sliced Wasserstein Generative Models

Size: px
Start display at page:

Download "Sliced Wasserstein Generative Models"

Transcription

1 Jiqing Wu * 1 Zhiwu Huang * 1 Wen Li 1 Janine Thoma 1 Luc Van Gool 1 2 Following the standard paradigm of GANs, we introduce a new SWGAN model by applying a dual SOT block to the discriminator, such that it can easily aparxiv: v3 [cs.cv] 5 Mar 2018 Abstract In the paper, we introduce a model of sliced optimal transport (SOT), which measures the distribution affinity with sliced Wasserstein distance (SWD). Since SWD enjoys the property of factorizing high-dimensional joint distributions into their multiple one-dimensional marginal distributions, its dual and primal forms can be approximated easier compared to Wasserstein distance (WD). Thus, we propose two types of differentiable SOT blocks to equip modern generative frameworks Auto-Encoders (AEs) and Generative Adversarial Networks (GANs) with the primal and dual forms of SWD. The superiority of our SWAE and SWGAN over the state-of-theart generative models is studied both qualitatively and quantitatively on standard benchmarks. 1. Introduction The domain of unsupervised learning has experienced tremendous advances due to its potential to capitalize on large pools of unlabeled data. One of the most promising approaches is generative modeling, which typically estimates the real data distribution either by variational inference using Auto-Encoders (VAEs) (Kingma & Welling, 2013) or by the adversarial process with Generative Adversarial Networks (GANs) (Goodfellow et al., 2014). In other words, rather than estimating the real data distribution directly, they often learn a map from a parameterized distribution to the high-dimensional real data distribution. During the map learning process, a statistical divergence such as Kullback- Leibler (KL) or Jensen-Shannon (JS) divergence between the model distribution and the real data distribution is minimized in a low-dimensional manifold or in the latent space such that visually pleasing samples can be generated. State-of-the-art generative models Wasserstein GANs (Arjovsky et al., 2017; Gulrajani et al., 2017; Salimans et al., * Equal contribution 1 Computer Vision Lab, ETH Zurich, Switzerland 2 VISICS, KU Leuven, Belgium. Correspondence to: Jiqing Wu <jiqing.wu@vision.ee.ethz.ch>, Zhiwu Huang <zhiwu.huang@vision.ee.ethz.ch>. 2018; Wei et al., 2018; Miyato et al., 2018) and Wasserstein AEs (Tolstikhin et al., 2018) proposed to make use of optimal transport (OT) theory to measure the distribution distance with Wasserstein distance (WD), which has proved to own better properties than KL and JS divergences employed in early generative models. Nevertheless, WD has some bottlenecks. For example, in high-dimensional space, it is known that the primal form of WD is intractable although some works (Tolstikhin et al., 2018; Genevay et al., 2017; Salimans et al., 2018) have attempted to propose relaxed unconstrained versions of the primal form. While the dual form of WD can be derived easier, it still suffers from the challenges to approximate the k-lipschitz constraint required by the Wasserstein metric in (Arjovsky et al., 2017; Gulrajani et al., 2017). Particularly, the weight clipping technique used in (Arjovsky et al., 2017) merely covers a subset of the k-lipschitz functions for some k as studied by (Gulrajani et al., 2017), while the adopted 1-Lipschitz gradient penalty in (Gulrajani et al., 2017) only relies on limited samples to approximate the k-lipschitz constraint on a high-dimensional domain (Wei et al., 2018). To overcome the weakness of the original OT in generative models, we alternatively exploit a new generative modeling scheme from the viewpoint of sliced OT (SOT) for better distribution transfer. Since its resulting sliced Wasserstein distance (SWD) is obtainable by decomposing the single n- dimensional distribution into its n one-dimensional marginal distributions, all of which can be well approximated independently as studied in (Bonneel et al., 2015; Kolouri et al., 2016; 2017), the SOT theory is able to offer a more favorable model than the state-of-the-art generative models employing WD. In particular, we introduce SOT based network blocks for the primal and dual forms of SWD, and incorporate them into modern generative frameworks (i.e. AEs and GANs). Our contributions can be summarized as follows: We propose a novel SWAE model, which equips classic AEs with differentiable SOT blocks. These blocks approximate the SWD primal form of the encoder distribution and the prior distribution progressively in an end-to-end network learning fashion.

2 proximate the dual form of SWD. In order to train the Radon transform matrices embedded in the SOT blocks, we generalize the standard network optimization algorithm to Stiefel manifolds. We evaluate our SWAE and SWGAN models on standard benchmarks and report visual results and quantitative scores with the Fréchet Inception Distance (FID), both of which achieve superior performances compared to the state-of-the-art generative models. 2. From Wasserstein Distance to Sliced Wasserstein Distance 2.1. Wasserstein Distance (WD) and Related Models In the literature of generative modeling, we often need to measure the agreement between two probability distributions P X and P Y. There are many ways to do so, among which the classical Kullback-Leibler (KL) and Jensen- Shannon (JS) divergences are widely used. Recently, (Arjovsky et al., 2017) introduced a more powerful family of statistical distances, namely Wasserstein distance (WD), to address mode collapse and instability issues. WD was originally studied in the context of optimal transport (OT) theory (Villani, 2008). Its primal formulation is W p (P X, P Y ) = inf E (X,Y ) γ[c(x, Y )], (1) γ Π(P X,P y) where Π(P X, P Y ) denotes the set of all joint distributions γ(x, Y ) whose marginals are P X, P Y respectively, and c : X Y R + can be any general cost function. In general, it is normally assumed that (X, d) is a metric space with metric d and X = Y. The cost function is set to c(x, Y ) = d p (X, Y ), where p > 0. In this case, the p-th root of Eq. 1 with respect to d p is the so-called p-wd. For p = 1, we can show that the Kantorovich s dual of WD is W 1 (P X, P Y ) = sup E X PX [f(x)] E Y PY [f(y )], f Lip 1 (2) where Lip 1 is the set of all 1-Lipschitz functions. As a side note, the dual 1-WD becomes k W 1 if we replace Lip 1 with Lip k for k > 0. Wasserstein AEs (WAEs) The paradigm of classic AEbased generative models such as VAEs employs variational inference to minimize an upper bound on the match between the model distribution of a decoder G : Z X and real image distribution, while imposing a regularization to match the model distribution of an encoder Q : X Z with the prior distribution. Unfortunately, the constraint on Q typically makes the variational problem hard to solve. As studied in (Tolstikhin et al., 2018), the latent codes of different samples will generally be forced to stay close to each other, leading to worse reconstruction, and making the generative models produce more blurry samples. To solve this problem, WAEs (Tolstikhin et al., 2018) apply an OT cost to the latent variable models P G, reaching state-of-theart results among AE-based generative models (Kingma & Welling, 2013; Rezende et al., 2014; Salimans et al., 2015; Makhzani et al., 2016; Mescheder et al., 2017). Specifically, the WAEs proposed a relaxed OT formulation to estimate the primal form of WD for decoding, and introduced an additional divergence D Z to regularize the encoding map: min G min Q inf E P X E Q(Z X) [c(x, G(Z))] Q(Z X) Q + λd Z (Q Z, P Z ), where Z is random noise, Q is any nonparametric set of encoders, c(x, Y ) = d p (X, Y ) is the OT cost function (which was actually set to c(x, Y ) = X Y 2 2), λ > 0 is a hyperparameter, and D Z is a divergence between the marginal distribution Q Z and the prior distribution P Z of Z. In (Tolstikhin et al., 2018), D Z is finally instantiated by maximum mean discrepancy (MMD) and GANs, both of which can be regarded as a distribution matching strategy, which differs from the employed OT plan for the decoding, since it is indeed non-trivial to apply the primal form of WD to the constraint on the encoding map especially when the latent codes has more than hundreds of dimensions. Wasserstein GANs (WGANs) Following Generative Adversarial Networks (GANs) (Goodfellow et al., 2014; Radford et al., 2015; Zhao et al., 2016; Berthelot et al., 2017; Mao et al., 2017), WGANs also established a min-max adversarial game between two competing networks, where a generator network G maps a source of noise to the input space while a discriminator network D receives either a generated sample or a true data sample and must distinguish between them. To stabilize GAN training for better image generation, the original WGAN (Arjovsky et al., 2017) proposed to approximate the dual form of the 1-WD by adopting a weight clipping strategy, which however satisfies the k-lipschitz constraint poorly. To alleviate this problem for the approximation on the dual form of 1-WD, the improved training of Wasserstein GAN (WGAN-GP) (Gulrajani et al., 2017) proposed to penalize the norm of the gradient of the critic with respect to a limited number of input samples. In particular, this gradient penalty is simply added to the basic WGAN loss (i.e. the dual form of 1-WD) for the following full objective: min G max E X P X [D(X)] E G(Z) PG [D(G(Z))] D + λ E ˆX P ˆX [( ˆX D( ˆX) 2 1) 2 ], where G( ), D( ) denotes the generator and discriminator respectively, Z is random noise, ˆX is the random samples following the distribution P ˆX that is sampled uniformly (3) (4)

3 along straight lines between pairs of points sampled from P X and P G, and ˆX D( ˆX) is the gradient w.r.t. ˆX. As pointed out in (Liu et al., 2018), it is not sufficient to approximate the k-lipschitz constraint on a high-dimensional domain by limited samples Sliced Wasserstein Distance (SWD) and Related Models The idea underlying the sliced Wasserstein distance (SWD) is to slice the plane with lines passing through the origin, to project the measures onto these lines where the distances is computed, and to integrate those distances over all possible lines. Formally, the SWD is defined as: ( SW p (P X, P Y ) = Wp p (RP X (, θ), RP Y (, S N 1 (5) where S N 1 is the unit sphere in R N, θ S N 1, R is the Radon transform, which maps P X to the set of its integral over the hyperplane of R n with respect to angle θ: Sliced Wasserstein Generative Models ) 1 p θ)d θ, RP X (t, θ) = p X ( x)δ(t x θ)d x, R N (6) where δ(.) is a Dirac measure. Eq. 5 results in the explicit benefit of simpler onedimensional density estimation compared to direct highdimensional density estimation by WD. Meanwhile, several works (Bonneel et al., 2015; Kolouri et al., 2016; 2017) exploit the fact that the one-dimensional p-wd has a closed form solution for the OT plan. More specifically, there exists a unique monotonically increasing transport map: τ(x) = F 1 Y (F X(x)), (7) where F X (x) and F Y (y) are the corresponding cumulative distribution functions (CDFs) for their probability density functions (PDFs) p X and p Y, which are computed by F X (x) = x p X(t)dt, F Y (y) = y p Y (t)dt. With the transport map, the one-dimensional p-wd between two distributions P X and P Y can be consequently computed by: ( W p (P X, P Y ) = X ) 1 d p (F 1 Y (F p X(x)), x)dp X (x). Furthermore, as proven in (Bonnotte, 2013), the SWD is a valid distance, and it is equivalent to the WD. Accordingly, in contrast to the original WD, the SWD tends to have a more promising potential to enhance modern generative modeling especially when one has access to samples of high-dimensional PDFs. In practice, the SWD can be approximated by using a finite summation over projection angles as done in most of existing works (Pitié et al., 2007; Kolouri et al., 2016; Karras (8) et al., 2018) 1. A better SWD approximation is achieved by the iterative distribution transfer (IDT) technique (Pitié et al., 2007), which has been proved to converge well to an optimal SWD if the iteration number is large enough. Specially, at each iteration, IDT first randomly samples a Radon transform matrix θ = [θ 1,..., θ N ] R N N satisfying orthogonality for the approximation of Radon transform R, which leads to a series of one-dimensional marginal PDFs, and then transfers the current distribution of the source data to the target distribution by matching their one-dimensional marginals with the map in Eq. 7. To minimize the SWD progressively, such an iteration is typically repeated for a number of times. 3. Proposed Method 3.1. Overview After studying the advantages of SWD over WD, in this section we will make the first attempt to introduce the SOT plan to the literature of generative modeling. As mentioned before, in practice, existing methods including IDT typically demand a large number of iterations to achieve a favorable solution. Furthermore, it is non-trivial to apply them directly to the context of neural network optimization. To overcome these limitations, we propose a novel SOT model, which adapts the IDT technique to a network setting. More specifically, this paper proposes to use a limited number of differentiable SOT blocks for optimization on the primal form of SWD in the context of AEs. The improved approximation of the primal form of SWD over the latent space using the proposed primal SOT blocks leads to better generation results. We also propose a variant of our SOT blocks for the dual SWD case in the context of GANs. Benefiting from the Radon transform, which effectively factorizes high-dimensional joint distributions into one-dimensional marginal distributions, the dual form of SWD can be estimated easier, leading to a better metric for generation Sliced Wasserstein AE (SWAE) As the state-of-the-art AE-based generative model, (Tolstikhin et al., 2018) proposed a MMD or GAN based constraint for encoding and a relaxed OT plan for decoding. While this moderates the problem of proximate latent codes, it does not impose a consistent OT constraint on the encoding, leading to a sub-optimal solution. Since it is highly non-trivial to approximate the primal form of the original WD on the latent codes, we propose to express the probabilistic encoders as a SOT plan instead, which aims to match the encoder distribution and prior distribution in the 1 It is worth mentioning that (Karras et al., 2018) employed the approximation of SWD merely for quantitative comparison on different GANs, and did not use the SWD for the GAN loss.

4 (b) 1D PDF matching for primal SWD (a) 1D PDF Radon transform Algorithm 1 The proposed SWAE algorithm Require: The primal SOT block number m, the batch size n, the encoder Q = S p E, where E : X Y, S p : Y Z, and decoder G, training hyperparameters etc. repeat Sample real data x from P X Sample Gaussian noise z from N (0, 1) Update Q = S p E and G by descending: 1 n n i=1 x i G(Q(x i )) 2 2 until Convergence (c) k-lipschitz mapping for dual SWD Figure 1. The proposed sliced optimal transport (SOT) block for the SWD primal form (a) (b) and dual form (a) (c) latent space with the primal form of SWD. In particular, we introduce an implicit approximation of the primal form of SWD for encoders such that the full objective of the whole AE model avoids any explicit regularization. Formally, the objective of our SWAE model is: min min inf E P X E Q(Z X) [ X G(Q(X)) 2 2], G Q Q(Z X) Q (9) where Q, G indicate the encoder and decoder respectively, and Q is implicitly constrained by our proposed SOT model that aims to optimize the primal form of SWD (Eq.5). For the optimization on the primal form of SWD between prior and encoder distributions, we design a type of differentiable SOT blocks, which consist of Radon transform and 1D PDF matching as shown in Fig. 1 (a) (b). Inspired by (Pitié et al., 2007), who show that a carefully selected sequence of Radon transform matrices leads to faster convergence to the optimal SWD (Eq. 5), we stack a limited number of differentiable SOT blocks for the encoder to learn a favorable sequence of Radon transform matrices, realizing a SOT plan in a deep learning manner. To begin with, let s denote the input data as x = [x 1,..., x n ] R N n (N is the data dimension, n is the sample number) for the encoder Q = S p E, which first uses a common encoder E and then applies our designed primal SOT blocks S p. The output of E is denoted with y = [y 1,..., y K ] R K n with K, n being the data dimension and sample number respectively. Then y is fed into the primal SOT block S p, which is implemented with the following steps. Step (a) of Fig. 1: First, we project the latent codes y with the Radon transform matrix θ = [θ 1,..., θ K ] R K K with the map y θ T y, which indicates the Radon transform projecting the K-dimensional distribution to the K 1-dimensional marginal distributions. Step (b) of Fig. 1: Then, we compute the 1D PDF matching for the data [θ1 T y,..., θk T y] using the map Eq. 7. In the end, we remap the samples according to the 1D Radon transformations. Accordingly, the whole mapping function of each primal SOT block can be defined as τ 1 (θ1 T y) θ1 T y S p (y) = y + θ. τ N (θn T y) θt N y, (10) where τ i is the SOT map (Eq. 7, i.e., τ(y) = F 1 Z (F Y (y)), where F Y, F Z are the CDFs for the input data y and the input noise z respectively) which can be solved by using discrete look-up tables. In practice, to make the process differentiable, we propose to employ piece-wise interpolation. More specially, to approximate the CDFs in Eq. 7, we first estimate their PDFs. Technically, the PDFs can be estimated by the histogram assignment of target data y i to the bin center c k. However, to make this operation differentiable in the context of backpropagation, we propose a soft assignment version instead: â(y i ) = e α yi c k 2 k e α yi c k 2, (11) which assigns the weight of target data y i to the bin cluster c k proportional to their proximity, but relative to proximities to other bin centres. â(y i ) ranges between 0 and 1, with the highest weight assigned to the closest cluster center. α is a parameter that controls the decay of the response with the magnitude of the distance. We remark that for α this setting returns to the original histogram hard assignment for closest bin center being 1 and 0 for other bins. In practice, we set α = 1. Note that the PDF estimation is performed on one-dimensional data, and thus it suffices to estimate the distribution with a moderate amount of samples Sliced Wasserstein GANs (SWGAN) Though stacking the primal SOT blocks enables our SWAE model to better match the encoder distribution with the prior distribution, applying it directly to the design of modern

5 Algorithm 2 The proposed SWGAN algorithm Require: The dual SOT block number m, the batch size n, the generator G and discriminator D = S d E, training hyperparameters etc. repeat Sample real data x from P X Sample Gaussian noise z from N (0, 1) Sample two vectors µ 1, µ 2 from uniform distribution U[0, 1] such that for each i ˆx i = (1 µ 1,i )x i + µ 1,i G(z i ) ŷ i = (1 µ 2,i )E(x i ) + µ 2,i E(G(z i )) Update G by descending: 1 n n i=1 D(G(z i)) Update D, S d by descending: 1 n n i=1 (D(x i) D(G(z i ))) + n i=1 λ 1 ˆxi D(G(ˆx i )) n i=1 λ 2( ŷi S d (ŷ i ) 2 k) 2 until Convergence GAN models is not desirable. This is because the standard GAN framework relies on the adversarial training of generator and discriminator. The latter is typically required to explicitly compute a useful distance between fake and real data distribution, while the proposed primal SOT blocks transfer a source distribution to a target distribution by implicitly measuring the SWD. To address this issue, we resort to the dual form of SWD by extending the design of SOT blocks to a dual version. Though k-lipschitz gradient penalty may not be sufficient in high dimensional space, it is relatively easy to satisfy the k-lipschitz constraint on one-dimensional space, which can be a potential advantage for applying the dual form of 1-SWD, i.e., ( ) sup E Xθ P Xθ [f(x θ )] E Yθ P Yθ [f(y θ )] dθ, S N 1 f Lip 1 (12) where P Xθ, P Yθ are the marginal distributions obtained by Radon transform R (Eq. 6). Thus, instead of approximating the dual of N-dimensional WD (Arjovsky et al., 2017), we propose a sliced version of Wasserstein GANs (SWGAN) to estimate the dual of N one-dimensional marginal distributions required by 1-SWD. Since the real data distribution is supported by a low-dimensional manifold, we follow the setting of classic GANs generator to first encode the n inputs data x = [x 1,..., x n ] R N n to lower-dimensional latent codes y = [y 1,..., y n ] R K n by E(x) = y, where E indicates the encoder. Then, we apply the dual SOT block to approximate the optimal f Lip 1 in Eq. 12. Step (a) of Fig. 1: We first project the latent codes y by Radon transform matrices θ = [θ 1,..., θ K ] R K K, that is, y θ T y, which corresponds to the Radon transform projecting the K-dimensional distribution to the K 1-dimensional marginal distributions. Step (c) of Fig. 1: Then we compute the k-lipschitz mapping function of the dual SOT block as follow φ(λ 1 (θ1 T y) + b 1 ) S d (y) =., (13) φ(λ K (θk T y) + b K) where θ = [θ 1,..., θ K ] R K K is the Radon transform matrix, φ is an activation function, λ i, b i are scalars. Supported by the universal approximation theorem of a neural network (Hornik, 1991), it is expected that with the sum of few dual SOT blocks we can well approximate the one-dimensional dual in Eq. 12 with respect to an arbitrary angle. It also intimately relates to the fact that Eq. 13 can easily satisfy the k-lipschitz constraint by imposing k-lipschitz gradient penalty, as long as we use a reasonable activation function. Eventually, by computing S d (y) = 1 K K i=1 (φ(λ i(θ T 1 y) + b i )) to approximate the integral of Eq. 12, we have our complete discriminator D = S d E. To avoid gradient explosion and vanishing for S d, we implicitly constraint the gradients of S d by imposing the gradient regularizer on the entire D. The final objective is thus as follow min G max E X P X [D(X)] E Z PZ [D(G(Z))] D + λ 1 E ˆX P ˆX [ ˆX E( ˆX) 2 2] + λ 2 EŶ PŶ [( Ŷ S d (Ŷ )) 2 k) 2 ], (14) here we sample the ˆX, Ŷ based on (Gulrajani et al., 2017) (see Alg. 2), where λ 1, λ 2 are the coefficients to balance the penalty terms, λ 2 is also used to absorb the scale k caused by the k-lipschitz constraint Training for SWAE and SWGAN Given that the Radon transform matrix θ should be orthogonal throughout training, we cannot directly apply the standard optimization algorithm. Meanwhile, it is widely known that the space of orthogonal matrices is actually a Stiefel manifold 2. Hence, we need to update the θ on the curved manifold instead of the flat Euclidean space. By building upon the manifold-valued weight update rule well-studied in (Huang & Van Gool, 2017), we generalize the optimization algorithm to Stiefel manifolds. Following the standard optimization (Absil et al., 2009) on Riemannian manifolds, we first employ parallel transport to transport the Euclidean gradient in the tangent space at the anchored orthogonal matrix θ t to the one in the tangent space at the orthogonal matrix θ t+1. The resulting Euclidean gradient is then subtracted from the normal component of the Euclidean gradient L (k) θ t, where L is the loss for the k-th 2 A compact Stiefel manifold St(d, D) is the set of d- dimensional orthogonal matrices in R D.

6 layer (for simplicity, we remove the index k in the following). Subsequently, searching along the tangential direction leads to the update in the tangent space of the Stiefel manifold. In the end, the resulting update is projected back to the Stiefel manifold with a retraction operation Γ. For more details about the Riemannian geometry of Stiefel manifolds and the retraction operation on Riemannian manifolds, we refer the readers to (Edelman et al., 1998; Absil et al., 2009). Accordingly, the update of the current orthogonal matrix θ t on the Stiefel manifold respects the following form L θ t = L θ t L θ t(θ t ) T θ t, (15) θ t+1 = Γ(θ t λω( L θ t)), (16) where Γ denotes the retraction operation that actually corresponds to QR decomposition, λ is the learning rate, Ω( ) denotes the standard optimization, L θ t(θ t ) T θ t is the normal component of the Euclidean gradient L θ t, which can be computed by the conventional backpropagation. By experimental study, we favor updating the Radon transfer matrices by the Adam optimization (Kingma & Ba, 2014) generalized on Stiefel manifold as discussed above, while the rest of the weights are updated by standard Adam optimization. 4. Experiments We conduct various experiments on three standard benchmarks including CIFAR-10, CelebA (Liu et al., 2015) and LSUN (Yu et al., 2015) to evaluate the proposed SWAE and SWGAN models. Recently, (Heusel et al., 2017) introduced the Fréchet inception distance (FID) to measure the difference between fake and real data distribution, and verified that the FID measurement is more similar to human judgment than the inception score (IS) (Salimans et al., 2016). Later, (Lucic et al., 2017) conducted a thorough large-scale investigation on the original GAN and its variants by evaluating their FID scores. Therefore, we not only present visual results but also evaluate the FID scores for all datasets to further justify the superiority of our models Model Architectures and Hyperparameters We compare our SWAE to VAE (Kingma & Welling, 2013), WAE-MMD and WAE-GAN (Heusel et al., 2017), which is equivalent to AAE (Makhzani et al., 2016) when the OT cost function is c(x, Y ) = X Y 2 2 (Mescheder et al., 2017). We also compare our SWGAN to DCGAN (Radford et al., 2015), WGAN (Arjovsky et al., 2017), and WGAN- GP (Gulrajani et al., 2017). For the compared methods, we follow the default hyperparameters recommended by the authors. All the AE-based generative models including our SWAE are applied with the common convolutional architectures as suggested by (Berthelot et al., 2017) for the decoder, except the difference that we use a shallow encoder con- Encoder Kernel size Resample Output shape NearestNeighbor Down Linear Primal SOT block 128 Decoder Noise 128 Linear (Conv, ELU) blocks NearestNeighbor Up (Conv, ELU) blocks NearestNeighbor Up (Conv, ELU) blocks Conv, tanh Table 1. Network architecture for SWAE Generator Kernel size Resample Output shape Noise 128 Linear Residual block [3 3] 2 Up Residual block [3 3] 2 Up Residual block [3 3] 2 Up Conv, tanh Discriminator Residual block [3 3] 2 Down Residual block [3 3] 2 Down Residual block [3 3] Residual block [3 3] Linear Dual SOT blocks 128 Table 2. Network architecture for SWGAN CIFAR-10 CelebA LSUN VAE WAE-MMD AAE(WAE-GAN) SWAE DCGAN WGAN 85.0* * WGAN-GP SWGAN Table 3. FID comparison of VAE and GAN models. As studied in (Gulrajani et al., 2017), the original WGAN does not achieve good performance for various architectures. For WGAN results with a *, we are unable to reach scores comparable to those reported in (Lucic et al., 2017). However, this does not influence our final conclusion. taining downscaling and a linear transform layer instead of several convolutional blocks (Tab. 1). As to all GAN models including our SWGAN, we follow the official implementation of (Gulrajani et al., 2017), and employ the standard ResNet structure (Gulrajani et al., 2017) for generator and discriminator (Tab. 2), we apply the LeakyReLU activation for the dual SOT block based on experimental tuning. The learning rate of SWGAN and SWAE is determined to be , we set the critic of SWGAN to be 4 for LSUN and CelebA and 5 for CIFAR-10, we determine the λ 1, λ 2 to be 20, 10 by cross validation.

7 Sliced Wasserstein Generative Models Figure 2. Curves of FID vs. iteration (left), cost vs. iteration (middle), number of SOT blocks vs. FID (right) for SWAE. Figure 3. Curves of FID vs. iteration (top left), cost vs. iteration (top right), FID vs. number of SOT blocks (bottom left), FID vs. k-lipschitz constraint (bottom right) for SWGAN. Figure 4. Interpolation results of the proposed SWAE (left) and SWGAN (right) models on CelebA Evaluation By comparing the state-of-the-art AE-based models, Tab. 3 demonstrates that our proposed SWAE outperforms the pure VAE models by sufficient margin, meanwhile our FID score is also competitive to the AAE (WAE-GAN) model, which additionally leverages the adversarial training from GANs and takes advantage of its better generalization ability. Nevertheless, due to adversarial training AAE (WAE-GAN) is generally unstable as studied in (Heusel et al., 2017), while our model has a very stable training due to using a simple l2 reconstruction loss without any regularizations (Fig. 2, middle). By comparing with the recently published WAE-MMD method, we can observe that our SWAE shows clear advantage in both terms of visual results and FID scores. This verifies that the primal SOT blocks on encoding work better than other divergence constraints (e.g. MMD) employed by WAE. As all the evaluated AE-based models are not successful for the test of LSUN dataset, we do not include those results in the paper. Compared to the state-of-the-art GAN models, the proposed SWGAN achieves top performances as well (Tab. 3), which quantitatively exhibits the advantages of our models. Lately, (Miyato et al., 2018) reported the competitive FID score 17.5 on CIFAR-10, while relying on extra label information. To the best of our knowledge, our SWGAN has reached the lowest FID on CIFAR-10 among all the existing generative models, whereas on CelebA and LSUN it is mildly outperformed by the GAN model (Heusel et al., 2017) which employs a very sophisticated scheme of two time-scale update rule. Furthermore, the visual results reported in Fig. 5 are consistent with the FID scores. In particular, our SWGAN obtains more visually pleasing images compared to WGAN and WGAN-GP in terms of better facial semantics and sufficient diversity. The same conclusion can be drawn on CIFAR-10 and LSUN as well. This empirically proves that our dual SOT model works better than the original OT model employed in the state-of-the-art WGANs. We believe this is mainly caused by the easier independent approximation of the SWD on multiple one-dimensional marginal distributions of the training data than the estimation of the original WD that directly works on samples with higher dimensions. In addition, we also study some key properties of our SWAE and SWGAN. First, we show the FID curve and the objective curve during training to verify their effectiveness in both terms of quantitative and qualitative measurement. The first plot of Fig. 2 and Fig. 3 demonstrates that the training of our SWAE and SWGAN is more stable than other models in terms of FID. Meanwhile, we can also witness that our proposed objective functions can faithfully reflect the image quality of generated samples as the training iterations increase. Second, we evaluate the impact caused by the number of our designed SOT blocks in terms of the FID metric. As it turns out, merely 3 primal and 1 dual SOT block(s) are sufficient to achieve the top performance (Fig. 2 and Fig. 3), which confirms our intuition that instead of randomly sampling a long sequence of Radon transform matrices, it is possible to learn a short sequence of them (i.e., stacking a limited number of SOT blocks) that better matches two distributions. Additionally, we study the impact of k-lipschitz constraint for SWGAN. Fig. 3 shows that SWGAN favors relatively small k, the optimal choice is k = Finally, Fig. 4 shows the interpolation results of SWAE and SWGAN to justify that they are capable of generating reasonable geometry of the latent manifold.

8 Sliced Wasserstein Generative Models VAE WAE-MMD AAE(WAE-GAN) SWAE DCGAN WGAN WGAN-GP SWGAN Figure 5. Visual results of AE-based (top 2 rows) and GAN (bottom 3 rows) models on CIFAR-10, CelebA and LSUN. 5. Conclusion In the paper, we introduced a novel model of sliced optimal transport for generative modeling. In particular, we endowed the modern AE-based and GAN models with the proposed SOT blocks for better approximation of either primal or dual form of the sliced Wasserstein distance, which serves as a measurement between model distribution and data distribution. Both FID and qualitative results demonstrated our clear advantages over existing models. Future works include a theoretical analysis on the approximation ratio of our proposed model for the primal and dual forms of SWD, and the extension of our model to the context of progressive growing networks for better generation.

9 Acknowledgements We would like to thank Dr. Xianfeng David Gu for his insightful blog about the Optimal Transport theory, NVidia for donating the GPUs used in this work. References Absil, P-A, Mahony, R., and Sepulchre, R. Optimization algorithms on matrix manifolds. Princeton University Press, Arjovsky, Martin, Chintala, Soumith, and Bottou, Léon. Wasserstein generative adversarial networks. In ICML, Berthelot, David, Schumm, Tom, and Metz, Luke. BEGAN: Boundary equilibrium generative adversarial networks. arxiv preprint arxiv: , Bonneel, Nicolas, Rabin, Julien, Peyré, Gabriel, and Pfister, Hanspeter. Sliced and radon Wasserstein barycenters of measures. Journal of Mathematical Imaging and Vision, 51(1):22 45, Bonnotte, Nicolas. Unidimensional and evolution methods for optimal transportation. PhD thesis, Paris, Edelman, Alan, Arias, Tomás A, and Smith, Steven T. The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications, 20 (2): , Genevay, Aude, Peyr, Gabriel, and Cuturi, Marco. Learning generative models with Sinkhorn divergences. arxiv preprint arxiv: , Goodfellow, Ian, Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley, David, Ozair, Sherjil, Courville, Aaron, and Bengio, Yoshua. Generative adversarial nets. In NIPS, Gulrajani, Ishaan, Ahmed, Faruk, Arjovsky, Martin, Dumoulin, Vincent, and Courville, Aaron. Improved training of Wasserstein GANs. In NIPS, Heusel, Martin, Ramsauer, Hubert, Unterthiner, Thomas, Nessler, Bernhard, and Hochreiter, Sepp. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp , Hornik, Kurt. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2): , Huang, Zhiwu and Van Gool, Luc. A Riemannian network for SPD matrix learning. In AAAI, Karras, Tero, Aila, Timo, Laine, Samuli, and Lehtinen, Jaakko. Progressive growing of GANs for improved quality, stability, and variation. In ICLR, Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arxiv preprint arxiv: , Kingma, Diederik P and Welling, Max. Auto-encoding variational bayes. arxiv preprint arxiv: , Kolouri, Soheil, Zou, Yang, and Rohde, Gustavo K. Sliced Wasserstein kernels for probability distributions. In CVPR, Kolouri, Soheil, Park, Se Rim, Thorpe, Matthew, Slepcev, Dejan, and Rohde, Gustavo K. Optimal mass transport: Signal processing and machine-learning applications. IEEE Signal Processing Magazine, 34(4):43 59, Liu, Ziwei, Luo, Ping, Wang, Xiaogang, and Tang, Xiaoou. Deep learning face attributes in the wild. In ICCV, Liu, Zixia, Wang, Liqiang, and Gong, Boqing. Improving the improved training of Wasserstein GANs. In ICLR, Lucic, Mario, Kurach, Karol, Michalski, Marcin, Gelly, Sylvain, and Bousquet, Olivier. Are GANs created equal? a large-scale study. arxiv preprint arxiv: , Makhzani, Alireza, Shlens, Jonathon, Jaitly, Navdeep, Goodfellow, Ian, and Frey, Brendan. Adversarial autoencoders. ICLR, Mao, Xudong, Li, Qing, Xie, Haoran, Lau, Raymond YK, Wang, Zhen, and Smolley, Stephen Paul. Least squares generative adversarial networks. ICCV, Mescheder, Lars, Nowozin, Sebastian, and Geiger, Andreas. Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In ICML, Miyato, Takeru, Kataoka, Toshiki, Koyama, Masanori, and Yoshida, Yuichi. Spectral normalization for generative adversarial networks. In ICLR, Pitié, François, Kokaram, Anil C, and Dahyot, Rozenn. Automated colour grading using colour distribution transfer. CVIU, 107(1): , Radford, Alec, Metz, Luke, and Chintala, Soumith. Unsupervised representation learning with deep convolutional generative adversarial networks. arxiv preprint arxiv: , 2015.

10 Rezende, Danilo Jimenez, Mohamed, Shakir, and Wierstra, Daan. Stochastic backpropagation and approximate inference in deep generative models. ICML, Salimans, Tim, Kingma, Diederik P, Welling, Max, et al. Markov chain monte carlo and variational inference: Bridging the gap. In ICML, Salimans, Tim, Goodfellow, Ian, Zaremba, Wojciech, Cheung, Vicki, Radford, Alec, and Chen, Xi. Improved techniques for training GANs. In Advances in Neural Information Processing Systems, pp , Salimans, Tim, Zhang, Han, Radford, Alec, and Metaxas, Dimitris. Improving GANs using optimal transport. In ICLR, Tolstikhin, Ilya, Bousquet, Olivier, Gelly, Sylvain, and Schoelkopf, Bernhard. Wasserstein auto-encoders. In ICLR, Villani, Cédric. Optimal transport: old and new, volume 338. Springer Science & Business Media, Wei, Xiang, Liu, Zixia, Wang, Liqiang, and Gong, Boqing. Improving the improved training of wasserstein gans. In ICLR, Yu, Fisher, Seff, Ari, Zhang, Yinda, Song, Shuran, Funkhouser, Thomas, and Xiao, Jianxiong. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arxiv preprint arxiv: , Zhao, Junbo, Mathieu, Michael, and LeCun, Yann. Energybased generative adversarial network. arxiv preprint arxiv: , Sliced Wasserstein Generative Models

Implicit generative models: dual vs. primal approaches

Implicit generative models: dual vs. primal approaches Implicit generative models: dual vs. primal approaches Ilya Tolstikhin MPI for Intelligent Systems ilya@tue.mpg.de Machine Learning Summer School 2017 Tübingen, Germany Contents 1. Unsupervised generative

More information

Introduction to Generative Adversarial Networks

Introduction to Generative Adversarial Networks Introduction to Generative Adversarial Networks Luke de Oliveira Vai Technologies Lawrence Berkeley National Laboratory @lukede0 @lukedeo lukedeo@vaitech.io https://ldo.io 1 Outline Why Generative Modeling?

More information

GENERATIVE ADVERSARIAL NETWORK-BASED VIR-

GENERATIVE ADVERSARIAL NETWORK-BASED VIR- GENERATIVE ADVERSARIAL NETWORK-BASED VIR- TUAL TRY-ON WITH CLOTHING REGION Shizuma Kubo, Yusuke Iwasawa, and Yutaka Matsuo The University of Tokyo Bunkyo-ku, Japan {kubo, iwasawa, matsuo}@weblab.t.u-tokyo.ac.jp

More information

arxiv: v1 [stat.ml] 11 Feb 2018

arxiv: v1 [stat.ml] 11 Feb 2018 Paul K. Rubenstein Bernhard Schölkopf Ilya Tolstikhin arxiv:80.0376v [stat.ml] Feb 08 Abstract We study the role of latent space dimensionality in Wasserstein auto-encoders (WAEs). Through experimentation

More information

Deep generative models of natural images

Deep generative models of natural images Spring 2016 1 Motivation 2 3 Variational autoencoders Generative adversarial networks Generative moment matching networks Evaluating generative models 4 Outline 1 Motivation 2 3 Variational autoencoders

More information

Dist-GAN: An Improved GAN using Distance Constraints

Dist-GAN: An Improved GAN using Distance Constraints Dist-GAN: An Improved GAN using Distance Constraints Ngoc-Trung Tran [0000 0002 1308 9142], Tuan-Anh Bui [0000 0003 4123 262], and Ngai-Man Cheung [0000 0003 0135 3791] ST Electronics - SUTD Cyber Security

More information

Class-Splitting Generative Adversarial Networks

Class-Splitting Generative Adversarial Networks Class-Splitting Generative Adversarial Networks Guillermo L. Grinblat 1, Lucas C. Uzal 1, and Pablo M. Granitto 1 arxiv:1709.07359v2 [stat.ml] 17 May 2018 1 CIFASIS, French Argentine International Center

More information

Tempered Adversarial Networks

Tempered Adversarial Networks Mehdi S. M. Sajjadi 1 2 Giambattista Parascandolo 1 2 Arash Mehrjou 1 Bernhard Schölkopf 1 Abstract Generative adversarial networks (GANs) have been shown to produce realistic samples from high-dimensional

More information

arxiv: v1 [cs.cv] 5 Jul 2017

arxiv: v1 [cs.cv] 5 Jul 2017 AlignGAN: Learning to Align Cross- Images with Conditional Generative Adversarial Networks Xudong Mao Department of Computer Science City University of Hong Kong xudonmao@gmail.com Qing Li Department of

More information

arxiv: v2 [cs.lg] 17 Dec 2018

arxiv: v2 [cs.lg] 17 Dec 2018 Lu Mi 1 * Macheng Shen 2 * Jingzhao Zhang 2 * 1 MIT CSAIL, 2 MIT LIDS {lumi, macshen, jzhzhang}@mit.edu The authors equally contributed to this work. This report was a part of the class project for 6.867

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

arxiv: v1 [cs.ne] 11 Jun 2018

arxiv: v1 [cs.ne] 11 Jun 2018 Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks arxiv:1806.03796v1 [cs.ne] 11 Jun 2018 Yash Upadhyay University of Minnesota, Twin Cities Minneapolis, MN, 55414

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks Report for Undergraduate Project - CS396A Vinayak Tantia (Roll No: 14805) Guide: Prof Gaurav Sharma CSE, IIT Kanpur, India

More information

Unsupervised Learning

Unsupervised Learning Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy

More information

arxiv: v1 [eess.sp] 23 Oct 2018

arxiv: v1 [eess.sp] 23 Oct 2018 Reproducing AmbientGAN: Generative models from lossy measurements arxiv:1810.10108v1 [eess.sp] 23 Oct 2018 Mehdi Ahmadi Polytechnique Montreal mehdi.ahmadi@polymtl.ca Mostafa Abdelnaim University de Montreal

More information

arxiv: v1 [cs.cv] 17 Nov 2016

arxiv: v1 [cs.cv] 17 Nov 2016 Inverting The Generator Of A Generative Adversarial Network arxiv:1611.05644v1 [cs.cv] 17 Nov 2016 Antonia Creswell BICV Group Bioengineering Imperial College London ac2211@ic.ac.uk Abstract Anil Anthony

More information

Progress on Generative Adversarial Networks

Progress on Generative Adversarial Networks Progress on Generative Adversarial Networks Wangmeng Zuo Vision Perception and Cognition Centre Harbin Institute of Technology Content Image generation: problem formulation Three issues about GAN Discriminate

More information

Inverting The Generator Of A Generative Adversarial Network

Inverting The Generator Of A Generative Adversarial Network 1 Inverting The Generator Of A Generative Adversarial Network Antonia Creswell and Anil A Bharath, Imperial College London arxiv:1802.05701v1 [cs.cv] 15 Feb 2018 Abstract Generative adversarial networks

More information

Smooth Deep Image Generator from Noises

Smooth Deep Image Generator from Noises Smooth Deep Image Generator from Noises Tianyu Guo,2,3, Chang Xu 2, Boxin Shi 4, Chao Xu,3, Dacheng Tao 2 Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, China 2 UBTECH Sydney

More information

IVE-GAN: INVARIANT ENCODING GENERATIVE AD-

IVE-GAN: INVARIANT ENCODING GENERATIVE AD- IVE-GAN: INVARIANT ENCODING GENERATIVE AD- VERSARIAL NETWORKS Anonymous authors Paper under double-blind review ABSTRACT Generative adversarial networks (GANs) are a powerful framework for generative tasks.

More information

GAN Frontiers/Related Methods

GAN Frontiers/Related Methods GAN Frontiers/Related Methods Improving GAN Training Improved Techniques for Training GANs (Salimans, et. al 2016) CSC 2541 (07/10/2016) Robin Swanson (robin@cs.toronto.edu) Training GANs is Difficult

More information

Generative Adversarial Network

Generative Adversarial Network Generative Adversarial Network Many slides from NIPS 2014 Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio Generative adversarial

More information

arxiv: v2 [cs.lg] 26 Oct 2018

arxiv: v2 [cs.lg] 26 Oct 2018 THE GAN LANDSCAPE: LOSSES, ARCHITECTURES, REGULARIZATION, AND NORMALIZATION Karol Kurach Mario Lucic Xiaohua Zhai Marcin Michalski Sylvain Gelly Google Brain arxiv:187.472v2 [cs.lg] 26 Oct 218 ABSTRACT

More information

Adversarially Learned Inference

Adversarially Learned Inference Institut des algorithmes d apprentissage de Montréal Adversarially Learned Inference Aaron Courville CIFAR Fellow Université de Montréal Joint work with: Vincent Dumoulin, Ishmael Belghazi, Olivier Mastropietro,

More information

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs Generative Modeling with Convolutional Neural Networks Denis Dus Data Scientist at InData Labs What we will discuss 1. 2. 3. 4. Discriminative vs Generative modeling Convolutional Neural Networks How to

More information

Improved Boundary Equilibrium Generative Adversarial Networks

Improved Boundary Equilibrium Generative Adversarial Networks Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identifier 10.1109/ACCESS.2017.DOI Improved Boundary Equilibrium Generative Adversarial Networks YANCHUN LI 1, NANFENG

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P (Durk) Kingma, Max Welling University of Amsterdam Ph.D. Candidate, advised by Max Durk Kingma D.P. Kingma Max Welling Problem class Directed graphical model:

More information

From attribute-labels to faces: face generation using a conditional generative adversarial network

From attribute-labels to faces: face generation using a conditional generative adversarial network From attribute-labels to faces: face generation using a conditional generative adversarial network Yaohui Wang 1,2, Antitza Dantcheva 1,2, and Francois Bremond 1,2 1 Inria, Sophia Antipolis, France 2 Université

More information

Auto-encoder with Adversarially Regularized Latent Variables

Auto-encoder with Adversarially Regularized Latent Variables Information Engineering Express International Institute of Applied Informatics 2017, Vol.3, No.3, P.11 20 Auto-encoder with Adversarially Regularized Latent Variables for Semi-Supervised Learning Ryosuke

More information

arxiv: v1 [cs.lg] 21 Dec 2018

arxiv: v1 [cs.lg] 21 Dec 2018 Non-Adversarial Image Synthesis with Generative Latent Nearest Neighbors Yedid Hoshen Facebook AI Research Jitendra Malik Facebook AI Research and UC Berkeley arxiv:1812.08985v1 [cs.lg] 21 Dec 2018 Abstract

More information

arxiv: v1 [cs.cv] 6 Sep 2018

arxiv: v1 [cs.cv] 6 Sep 2018 arxiv:1809.01890v1 [cs.cv] 6 Sep 2018 Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks Koichi Hamada, Kentaro Tachibana, Tianqi Li, Hiroto

More information

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin GENERATIVE MODEL Given a training dataset, x, try to estimate the distribution, Pdata(x) Explicitly or Implicitly (GAN) Explicitly

More information

A New CGAN Technique for Constrained Topology Design Optimization. Abstract

A New CGAN Technique for Constrained Topology Design Optimization. Abstract A New CGAN Technique for Constrained Topology Design Optimization M.-H. Herman Shen 1 and Liang Chen Department of Mechanical and Aerospace Engineering The Ohio State University Abstract This paper presents

More information

arxiv: v3 [cs.cv] 3 Jul 2018

arxiv: v3 [cs.cv] 3 Jul 2018 Improved Training of Generative Adversarial Networks using Representative Features Duhyeon Bang 1 Hyunjung Shim 1 arxiv:1801.09195v3 [cs.cv] 3 Jul 2018 Abstract Despite the success of generative adversarial

More information

Learning to generate with adversarial networks

Learning to generate with adversarial networks Learning to generate with adversarial networks Gilles Louppe June 27, 2016 Problem statement Assume training samples D = {x x p data, x X } ; We want a generative model p model that can draw new samples

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION 2017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 25 28, 2017, TOKYO, JAPAN DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1,

More information

arxiv: v2 [stat.ml] 21 Oct 2017

arxiv: v2 [stat.ml] 21 Oct 2017 Variational Approaches for Auto-Encoding Generative Adversarial Networks arxiv:1706.04987v2 stat.ml] 21 Oct 2017 Mihaela Rosca Balaji Lakshminarayanan David Warde-Farley Shakir Mohamed DeepMind {mihaelacr,balajiln,dwf,shakir}@google.com

More information

arxiv: v1 [stat.ml] 31 May 2018

arxiv: v1 [stat.ml] 31 May 2018 Ratio Matching MMD Nets: Low dimensional projections for effective deep generative models arxiv:1806.00101v1 [stat.ml] 31 May 2018 Akash Srivastava School of Informatics akash.srivastava@ed.ac.uk Kai Xu

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1, Wei-Chen Chiu 2, Sheng-De Wang 1, and Yu-Chiang Frank Wang 1 1 Graduate Institute of Electrical Engineering,

More information

Are GANs Created Equal? A Large-Scale Study

Are GANs Created Equal? A Large-Scale Study Are GANs Created Equal? A Large-Scale Study Mario Lucic Karol Kurach Marcin Michalski Google Brain Abstract Olivier Bousquet Sylvain Gelly Generative adversarial networks (GAN) are a powerful subclass

More information

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17 Improving Generative Adversarial Networks with Denoising Feature Matching David Warde-Farley 1 Yoshua Bengio 1 1 University of Montreal, ICLR,2017 Presenter: Bargav Jayaraman Outline 1 Introduction 2 Background

More information

arxiv: v1 [cs.cv] 7 Jun 2018

arxiv: v1 [cs.cv] 7 Jun 2018 CapsGAN: Using Dynamic Routing for Generative Adversarial Networks arxiv:1806.03968v1 [cs.cv] 7 Jun 2018 Raeid Saqur Department of Computer Science University of Toronto raeidsaqur@cs.toronto.edu Abstract

More information

arxiv: v1 [stat.ml] 19 Aug 2017

arxiv: v1 [stat.ml] 19 Aug 2017 Semi-supervised Conditional GANs Kumar Sricharan 1, Raja Bala 1, Matthew Shreve 1, Hui Ding 1, Kumar Saketh 2, and Jin Sun 1 1 Interactive and Analytics Lab, Palo Alto Research Center, Palo Alto, CA 2

More information

arxiv: v4 [stat.ml] 4 Dec 2017

arxiv: v4 [stat.ml] 4 Dec 2017 On the Effects of Batch and Weight Normalization in Generative Adversarial Networks arxiv:1704.03971v4 [stat.ml] 4 Dec 2017 Abstract Sitao Xiang 1 1, 2, 3 Hao Li 1 University of Southern California 2 Pinscreen

More information

arxiv: v5 [cs.ai] 10 Dec 2017

arxiv: v5 [cs.ai] 10 Dec 2017 ON CONVERGENCE AND STABILITY OF GANS Naveen Kodali, Jacob Abernethy, James Hays & Zsolt Kira College of Computing Georgia Institute of Technology Atlanta, GA 30332, USA {nkodali3,prof,hays,zkira}@gatech.edu

More information

arxiv: v1 [cs.cv] 7 Mar 2018

arxiv: v1 [cs.cv] 7 Mar 2018 Accepted as a conference paper at the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) 2018 Inferencing Based on Unsupervised Learning of Disentangled

More information

maagma Modified-Adversarial-Autoencoding Generator with Multiple Adversaries: Optimizing Multiple Learning Objectives for Image Generation with GANs

maagma Modified-Adversarial-Autoencoding Generator with Multiple Adversaries: Optimizing Multiple Learning Objectives for Image Generation with GANs maagma Modified-Adversarial-Autoencoding Generator with Multiple Adversaries: Optimizing Multiple Learning Objectives for Image Generation with GANs Sahil Chopra Stanford University 353 Serra Mall, Stanford

More information

arxiv: v1 [cs.cv] 4 Nov 2018

arxiv: v1 [cs.cv] 4 Nov 2018 Improving GAN with neighbors embedding and gradient matching Ngoc-Trung Tran, Tuan-Anh Bui, Ngai-Man Cheung ST Electronics - SUTD Cyber Security Laboratory Singapore University of Technology and Design

More information

arxiv: v1 [cs.gr] 22 Jan 2019

arxiv: v1 [cs.gr] 22 Jan 2019 Generation High resolution 3D model from natural language by Generative Adversarial Network Kentaro Fukamizu, Masaaki Kondo, Ryuichi Sakamoto arxiv:1901.07165v1 [cs.gr] 22 Jan 2019 Abstract Since creating

More information

Lecture 3 GANs and Their Applications in Image Generation

Lecture 3 GANs and Their Applications in Image Generation Lecture 3 GANs and Their Applications in Image Generation Lin ZHANG, PhD School of Software Engineering Tongji University Fall 2017 Outline Introduction Theoretical Part Application Part Existing Implementations

More information

Quantitative Evaluation of Generative Adversarial Networks and Improved Training Techniques

Quantitative Evaluation of Generative Adversarial Networks and Improved Training Techniques Quantitative Evaluation of Generative Adversarial Networks and Improved Training Techniques by Yadong Li to obtain the degree of Master of Science at the Delft University of Technology, to be defended

More information

Stochastic Function Norm Regularization of DNNs

Stochastic Function Norm Regularization of DNNs Stochastic Function Norm Regularization of DNNs Amal Rannen Triki Dept. of Computational Science and Engineering Yonsei University Seoul, South Korea amal.rannen@yonsei.ac.kr Matthew B. Blaschko Center

More information

Visual Recommender System with Adversarial Generator-Encoder Networks

Visual Recommender System with Adversarial Generator-Encoder Networks Visual Recommender System with Adversarial Generator-Encoder Networks Bowen Yao Stanford University 450 Serra Mall, Stanford, CA 94305 boweny@stanford.edu Yilin Chen Stanford University 450 Serra Mall

More information

(BigGAN) Large Scale GAN Training for High Fidelity Natural Image Synthesis

(BigGAN) Large Scale GAN Training for High Fidelity Natural Image Synthesis (BigGAN) Large Scale GAN Training for High Fidelity Natural Image Synthesis Andrew Brock, Jeff Donahue, Karen Simonyan DeepMind https://arxiv.org/abs/1809.11096 Presented at October 30th, 2018 Contents

More information

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

Lab meeting (Paper review session) Stacked Generative Adversarial Networks Lab meeting (Paper review session) Stacked Generative Adversarial Networks 2017. 02. 01. Saehoon Kim (Ph. D. candidate) Machine Learning Group Papers to be covered Stacked Generative Adversarial Networks

More information

Resembled Generative Adversarial Networks: Two Domains with Similar Attributes

Resembled Generative Adversarial Networks: Two Domains with Similar Attributes DUHYEON BANG, HYUNJUNG SHIM: RESEMBLED GAN 1 Resembled Generative Adversarial Networks: Two Domains with Similar Attributes Duhyeon Bang duhyeonbang@yonsei.ac.kr Hyunjung Shim kateshim@yonsei.ac.kr School

More information

song2vec: Determining Song Similarity using Deep Unsupervised Learning

song2vec: Determining Song Similarity using Deep Unsupervised Learning song2vec: Determining Song Similarity using Deep Unsupervised Learning CS229 Final Project Report (category: Music & Audio) Brad Ross (bross35), Prasanna Ramakrishnan (pras1712) 1 Introduction Humans are

More information

Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) Generative Adversarial Networks (GANs) Hossein Azizpour Most of the slides are courtesy of Dr. Ian Goodfellow (Research Scientist at OpenAI) and from his presentation at NIPS 2016 tutorial Note. I am generally

More information

arxiv: v1 [cs.cv] 16 Jul 2017

arxiv: v1 [cs.cv] 16 Jul 2017 enerative adversarial network based on resnet for conditional image restoration Paper: jc*-**-**-****: enerative Adversarial Network based on Resnet for Conditional Image Restoration Meng Wang, Huafeng

More information

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao Introduction to GAN Generative Adversarial Networks Junheng(Jeff) Hao Adversarial Training is the coolest thing since sliced bread. -- Yann LeCun Roadmap 1. Generative Modeling 2. GAN 101: What is GAN?

More information

COVERAGE AND QUALITY DRIVEN TRAINING

COVERAGE AND QUALITY DRIVEN TRAINING COVERAGE AND QUALITY DRIVEN TRAINING OF GENERATIVE IMAGE MODELS Anonymous authors Paper under double-blind review ABSTRACT Generative modeling of natural images has been extensively studied in recent years,

More information

arxiv: v3 [stat.ml] 20 Mar 2018

arxiv: v3 [stat.ml] 20 Mar 2018 Are GANs Created Equal? A Large-Scale Study Mario Lucic Karol Kurach Marcin Michalski Sylvain Gelly Olivier Bousquet Google Brain arxiv:1711.1337v3 [stat.ml] 2 Mar 218 Abstract Generative adversarial networks

More information

Progressive Generative Hashing for Image Retrieval

Progressive Generative Hashing for Image Retrieval Progressive Generative Hashing for Image Retrieval Yuqing Ma, Yue He, Fan Ding, Sheng Hu, Jun Li, Xianglong Liu 2018.7.16 01 BACKGROUND the NNS problem in big data 02 RELATED WORK Generative adversarial

More information

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao Introduction to GAN Generative Adversarial Networks Junheng(Jeff) Hao Adversarial Training is the coolest thing since sliced bread. -- Yann LeCun Roadmap 1. Generative Modeling 2. GAN 101: What is GAN?

More information

TGANv2: Efficient Training of Large Models for Video Generation with Multiple Subsampling Layers

TGANv2: Efficient Training of Large Models for Video Generation with Multiple Subsampling Layers TGANv2: Efficient Training of Large Models for Video Generation with Multiple Subsampling Layers Masaki Saito Shunta Saito Preferred Networks, Inc. {msaito, shunta}@preferred.jp arxiv:1811.09245v1 [cs.cv]

More information

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN. Stacked Denoising Sparse Variational (Adapted from Paul Quint and Ian Goodfellow) Stacked Denoising Sparse Variational Autoencoding is training a network to replicate its input to its output Applications:

More information

Unpaired Multi-Domain Image Generation via Regularized Conditional GANs

Unpaired Multi-Domain Image Generation via Regularized Conditional GANs Unpaired Multi-Domain Image Generation via Regularized Conditional GANs Xudong Mao and Qing Li Department of Computer Science, City University of Hong Kong xudong.xdmao@gmail.com, itqli@cityu.edu.hk information

More information

Controllable Generative Adversarial Network

Controllable Generative Adversarial Network Controllable Generative Adversarial Network arxiv:1708.00598v2 [cs.lg] 12 Sep 2017 Minhyeok Lee 1 and Junhee Seok 1 1 School of Electrical Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul,

More information

arxiv: v1 [cs.cv] 25 Jan 2018

arxiv: v1 [cs.cv] 25 Jan 2018 GENERATIVE ADVERSARIAL NETWORKS USING ADAPTIVE CONVOLUTION Nhat M. Nguyen, Nilanjan Ray Department of Computing Science University of Alberta Edmonton, Alberta T6G 2R3 Canada {nmnguyen,nray1}@ualberta.ca

More information

Unpaired Multi-Domain Image Generation via Regularized Conditional GANs

Unpaired Multi-Domain Image Generation via Regularized Conditional GANs Unpaired Multi-Domain Image Generation via Regularized Conditional GANs Xudong Mao and Qing Li Department of Computer Science, City University of Hong Kong xudong.xdmao@gmail.com, itqli@cityu.edu.hk Abstract

More information

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models [Supplemental Materials] 1. Network Architecture b ref b ref +1 We now describe the architecture of the networks

More information

Towards Principled Methods for Training Generative Adversarial Networks. Martin Arjovsky & Léon Bottou

Towards Principled Methods for Training Generative Adversarial Networks. Martin Arjovsky & Léon Bottou Towards Principled Methods for Training Generative Adversarial Networks Martin Arjovsky & Léon Bottou Unsupervised learning - We have samples from an unknown distribution Unsupervised learning - We have

More information

DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17

DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17 DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE 155 - MACHINE LEARNING & DATA MINING - LECTURE 17 GENERATIVE MODELS DATA 3 DATA 4 example 1 DATA 5 example 2 DATA 6 example 3 DATA 7 number of

More information

19: Inference and learning in Deep Learning

19: Inference and learning in Deep Learning 10-708: Probabilistic Graphical Models 10-708, Spring 2017 19: Inference and learning in Deep Learning Lecturer: Zhiting Hu Scribes: Akash Umakantha, Ryan Williamson 1 Classes of Deep Generative Models

More information

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Kihyuk Sohn 1 Sifei Liu 2 Guangyu Zhong 3 Xiang Yu 1 Ming-Hsuan Yang 2 Manmohan Chandraker 1,4 1 NEC Labs

More information

Deep Hybrid Discriminative-Generative Models for Semi-Supervised Learning

Deep Hybrid Discriminative-Generative Models for Semi-Supervised Learning Volodymyr Kuleshov 1 Stefano Ermon 1 Abstract We propose a framework for training deep probabilistic models that interpolate between discriminative and generative approaches. Unlike previously proposed

More information

Bridging Theory and Practice of GANs

Bridging Theory and Practice of GANs MedGAN ID-CGAN Progressive GAN LR-GAN CGAN IcGAN b-gan LS-GAN AffGAN LAPGAN LSGAN InfoGAN CatGAN SN-GAN DiscoGANMPM-GAN AdaGAN AMGAN igan IAN CoGAN Bridging Theory and Practice of GANs McGAN Ian Goodfellow,

More information

arxiv: v4 [cs.lg] 1 May 2018

arxiv: v4 [cs.lg] 1 May 2018 Controllable Generative Adversarial Network arxiv:1708.00598v4 [cs.lg] 1 May 2018 Minhyeok Lee School of Electrical Engineering Korea University Seoul, Korea 02841 suam6409@korea.ac.kr Abstract Junhee

More information

When Variational Auto-encoders meet Generative Adversarial Networks

When Variational Auto-encoders meet Generative Adversarial Networks When Variational Auto-encoders meet Generative Adversarial Networks Jianbo Chen Billy Fang Cheng Ju 14 December 2016 Abstract Variational auto-encoders are a promising class of generative models. In this

More information

arxiv: v2 [cs.cv] 6 Dec 2017

arxiv: v2 [cs.cv] 6 Dec 2017 Arbitrary Facial Attribute Editing: Only Change What You Want arxiv:1711.10678v2 [cs.cv] 6 Dec 2017 Zhenliang He 1,2 Wangmeng Zuo 4 Meina Kan 1 Shiguang Shan 1,3 Xilin Chen 1 1 Key Lab of Intelligent Information

More information

Multi-Modal Generative Adversarial Networks

Multi-Modal Generative Adversarial Networks Multi-Modal Generative Adversarial Networks By MATAN BEN-YOSEF Under the supervision of PROF. DAPHNA WEINSHALL Faculty of Computer Science and Engineering THE HEBREW UNIVERSITY OF JERUSALEM A thesis submitted

More information

ANY image data set only covers a fixed domain. This

ANY image data set only covers a fixed domain. This Extra Domain Data Generation with Generative Adversarial Nets Luuk Boulogne Bernoulli Institute Department of Artificial Intelligence University of Groningen Groningen, The Netherlands lhboulogne@gmail.com

More information

IMPROVING SAMPLING FROM GENERATIVE AUTOENCODERS WITH MARKOV CHAINS

IMPROVING SAMPLING FROM GENERATIVE AUTOENCODERS WITH MARKOV CHAINS IMPROVING SAMPLING FROM GENERATIVE AUTOENCODERS WITH MARKOV CHAINS Antonia Creswell, Kai Arulkumaran & Anil A. Bharath Department of Bioengineering Imperial College London London SW7 2BP, UK {ac2211,ka709,aab01}@ic.ac.uk

More information

DCGANs for image super-resolution, denoising and debluring

DCGANs for image super-resolution, denoising and debluring DCGANs for image super-resolution, denoising and debluring Qiaojing Yan Stanford University Electrical Engineering qiaojing@stanford.edu Wei Wang Stanford University Electrical Engineering wwang23@stanford.edu

More information

arxiv: v1 [cs.gr] 27 Dec 2018

arxiv: v1 [cs.gr] 27 Dec 2018 Sampling using Neural Networks for colorizing the grayscale images arxiv:1812.10650v1 [cs.gr] 27 Dec 2018 Wonbong Jang Department of Statistics London School of Economics London, WC2A 2AE w.jang@lse.ac.uk

More information

GENERATIVE ADVERSARIAL NETWORKS FOR IMAGE STEGANOGRAPHY

GENERATIVE ADVERSARIAL NETWORKS FOR IMAGE STEGANOGRAPHY GENERATIVE ADVERSARIAL NETWORKS FOR IMAGE STEGANOGRAPHY Denis Volkhonskiy 2,3, Boris Borisenko 3 and Evgeny Burnaev 1,2,3 1 Skolkovo Institute of Science and Technology 2 The Institute for Information

More information

The Amortized Bootstrap

The Amortized Bootstrap Eric Nalisnick 1 Padhraic Smyth 1 Abstract We use amortized inference in conjunction with implicit models to approximate the bootstrap distribution over model parameters. We call this the amortized bootstrap,

More information

Image Restoration with Deep Generative Models

Image Restoration with Deep Generative Models Image Restoration with Deep Generative Models Raymond A. Yeh *, Teck-Yian Lim *, Chen Chen, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do Department of Electrical and Computer Engineering, University

More information

Symmetric Variational Autoencoder and Connections to Adversarial Learning

Symmetric Variational Autoencoder and Connections to Adversarial Learning Symmetric Variational Autoencoder and Connections to Adversarial Learning Liqun Chen 1 Shuyang Dai 1 Yunchen Pu 1 Erjin Zhou 4 Chunyuan Li 1 Qinliang Su 2 Changyou Chen 3 Lawrence Carin 1 1 Duke University,

More information

Deep Fakes using Generative Adversarial Networks (GAN)

Deep Fakes using Generative Adversarial Networks (GAN) Deep Fakes using Generative Adversarial Networks (GAN) Tianxiang Shen UCSD La Jolla, USA tis038@eng.ucsd.edu Ruixian Liu UCSD La Jolla, USA rul188@eng.ucsd.edu Ju Bai UCSD La Jolla, USA jub010@eng.ucsd.edu

More information

IMPLICIT AUTOENCODERS

IMPLICIT AUTOENCODERS IMPLICIT AUTOENCODERS Anonymous authors Paper under double-blind review ABSTRACT In this paper, we describe the implicit autoencoder (IAE), a generative autoencoder in which both the generative path and

More information

Geometric Enclosing Networks

Geometric Enclosing Networks Geometric Enclosing Networks Trung Le, Hung Vu, Tu Dinh Nguyen and Dinh Phung Faculty of Information Technology, Monash University Center for Pattern Recognition and Data Analytics, Deakin University,

More information

Structured GANs. Irad Peleg 1 and Lior Wolf 1,2. Abstract. 1. Introduction. 2. Symmetric GANs Related Work

Structured GANs. Irad Peleg 1 and Lior Wolf 1,2. Abstract. 1. Introduction. 2. Symmetric GANs Related Work Structured GANs Irad Peleg 1 and Lior Wolf 1,2 1 Tel Aviv University 2 Facebook AI Research Abstract We present Generative Adversarial Networks (GANs), in which the symmetric property of the generated

More information

arxiv: v12 [cs.cv] 10 Jun 2018

arxiv: v12 [cs.cv] 10 Jun 2018 arxiv:1711.06491v12 [cs.cv] 10 Jun 2018 High-Resolution Deep Convolutional Generative Adversarial Networks J. D. Curtó,1,2,3,4, I. C. Zarza,1,2,3,4, F. De La Torre 2, I. King 1, and M. R. Lyu 1 1 Dept.

More information

Deep Learning With Noise

Deep Learning With Noise Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models

Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models Aditya Grover, Manik Dhar, Stefano Ermon Computer Science Department Stanford University {adityag, dmanik, ermon}@cs.stanford.edu

More information

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different

More information

Semantic Segmentation. Zhongang Qi

Semantic Segmentation. Zhongang Qi Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in

More information