Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions

Size: px
Start display at page:

Download "Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions"

Transcription

1 Disentangling Latent Spae for VAE by Label Relevant/Irrelevant Dimensions Zhilin Zheng 1,2 Li Sun 1,2 1 Shanghai Key Laboratory of Multidimensional Information Proessing 2 East China Normal University @stu.enu.edu.n sunli@ee.enu.edu.n This paper deals with image generation problem in VAE with two sperate enoders. For a single input x, our goal is to disentangle the latent spae ode z, omputed by enoders, into the label relevant dimensions z s and irrelevant ones z u. We emphasize the differene between z s and z u, and their orresponding enoders. For z s, sine label is known during training, it should be more aurate and speifi. While without any label onstraint, z u should be genarxiv: v3 [s.cv] 4 Jan 2019 Abstrat VAE requires the standard Gaussian distribution as a prior in the latent spae. Sine all odes tend to follow the same prior, it often suffers the so-alled posterior ollapse. To avoid this, this paper introdues the lass speifi distribution for the latent ode. But different from CVAE, we present a method for disentangling the latent spae into the label relevant and irrelevant dimensions, z s and z u, for a single input. We apply two separated enoders to map the input into z s and z u respetively, and then give the onatenated ode to the deoder to reonstrut the input. The label irrelevant ode z u represent the ommon harateristis of all inputs, hene they are onstrained by the standard Gaussian, and their enoder is trained in amortized variational inferene way, like VAE. While z s is assumed to follow the Gaussian mixture distribution in whih eah omponent orresponds to a partiular lass. The parameters for the Gaussian omponents in z s enoder are optimized by the label supervision in a global stohasti way. In theory, we show that our method is atually equivalent to adding a KL divergene term on the joint distribution of z s and the lass label, and it an diretly inrease the mutual information between z s and the label. Our model an also be extended to GAN by adding a disriminator in the pixel domain so that it produes high quality and diverse images. 1. Introdution Learning a deep generative model for the strutured image data is diffiult beause this task is not simply modeling a many-to-one mapping funtion suh as the lassifiation, instead it is often required to generate diverse outputs for similar odes sampled from a simple distribution. Furthermore, image x in the high dimension spae often lies in a omplex manifold, thus the generative model should apture the underlying data distribution p(x). Basially, Variational Auto-Enoder (VAE) [34, 20] and Generative Adversarial Network (GAN) [13, 25] are two strategies for strutured data generation. In VAE, the enoder p φ (z x) maps data x into the ode z in latent spae. The deoder, represented by p θ (x z), is given a latent ode z sampling from a distribution speified by the enoder and tries to reonstrut x. The enoder and deoder in VAE are trained together mainly based on the data reonstrution loss. At the same time, it requires to regularize the distribution p φ (z x) to be simple (e.g. Gaussian) based on the Kullbak-Leibler (KL) divergene between p(z x) and q(z) = N (0, I), so that the sampling in latent spae is easy. Optimization for VAE is quite stable, but results from it are blurry. Mainly beause the posterior defined by p φ (z x) is not omplex enough to apture the true posterior, also known for posterior ollapse. On the other hand, GAN treats the data generation task as a min/max game between a generator G(z) and disriminator D(x). The adversarial loss omputed from the disriminator makes generated image more realisti, but its training beomes more unstable. In [10, 22, 28], VAE and GAN are integrated together so that they an benefit eah other. Both VAE and GAN work in an unsupervised way without giving any ondition of the label on the generated image. Instead, onditional VAE (VAE) [39, 3] extends it by showing the label for both enoder and deoder. It learns data distribution onditioned on the given label. Hene, the enoder and deoder beome p φ (z x, ) and p θ (x z, ). Similarly, in onditional GAN (GAN) [9, 18, 33, 30] label is given to both generator G(z, ) and disriminator D(x, ). Theoretially, feeding label to either the enoder in VAE or deoder in VAE or GAN helps inreasing the mutual information between the generated x and the label. Thus, it an improves the quality of generated image.

2 eral. Speifially, the two enoders are onstrained with different priors on their posterior distributions p φs (z s x) and p φu (z u x). Similar with VAE or VAE, in whih the full ode z is label irrelevant, the prior for z u is also hosen N (0, I). But different from previous works, the prior q(z s x) beomes omplex to apture the label revelent distribution. From the deoder s perspetive, it takes the onatenation of z s and z u to reonstrut the input x.here the distintion with VAE and GAN is that they uses the fixed, one-hot enoding label, while our work applies z s, whih is onsidered to be a variational, soft label. Note that there are two stages for training our model. First, the enoder for z s gets trained for lassifiation task under the supervision of label. Here instead of the softmax ross entropy loss, Gaussian mixture ross entropy loss proposed in [44] is adopted sine it aumulates the mean µ and variane σ for samples with the same label, and models it as the Gaussian N (µ, σ ), hene z s N (µ, σ ). The first stage speifies the label relevant distribution. In the seond stage, the two enoders and the deoder are trained jointly in an end-to-end manner based on the reonstrution loss. Meanwhile, priors of z s N (µ, σ ) and z u N (0, I) are also onsidered. The main ontribution of this paper lies in following aspets: (1) for a single input x to the enoder, we provide an algorithm to disentangle the latent spae into label relevant and irrelevant dimensions in VAE. Previous works like [15, 4, 37] disentangle the latent spae in AE not VAE. So it is impossible to make the inferene from their model. Moveover, [27, 4, 23] requires at least two inputs for training. (2) we find the Gaussian mixture loss funtion is suitable way for estimating the parameters of the prior distribution, and it an be optimized in VAE framework. (3) we give both a theoretial derivation and a variety of detailed experiments to explain the effetiveness of our work. 2. Related works Two types of methods for the strutured image generation are VAE and GAN. VAE [20] is a type of parametri model defined by p θ (x z) and p φ (z x), whih employs the idea of variational inferene to maximize the evidene lower bound (ELBO), as is shown in (1). log p θ (x) E qφ (z x)(log p θ (x z)) D KL (p φ (z x) q(z)) (1) The right side of the above is the ELBO, whih is the lower bound of maximum likelihood. In VAE, a differentiable enoder-deoder are onneted, and they are parameterized by φ and θ, respetively. E pφ (z x)(log p θ (x z)) represents the end-to-end reonstrution loss, and KL(p φ (z x) q(z)) is the KL divergene between the enoder s output distribution p φ (z x) and the prior q(z), whih is usually modeled by standard normal distribution N (0, I). Note that VAE assumes that the posterior p φ (z x) is of Gaussian, and the µ and σ are estimated for every single input x by the enoder. This strategy is named amortized variational inferene (AVI), and it is more effiieny than stohasti variational inferene (SVI) [17]. VAE s advantage is that its loss is easy to optimize, but the simple prior in latent spae may not apture the omplex data patterns whih often leads to the mode ollapse in latent spae. Moreover, VAE s ode is hard to be interpreted. Thus, many works fous on improving VAE on these two aspets. VAE [39] adds the label vetor as the input for both the enoder and deoder, so that the latent ode and generated image is onditioned on the label, and potentially prevent the latent ollapse. On the other hand, β- VAE [16, 7] is a unsupervised approah for the latent spae disentanglement. It introdues a simple hyper-parameter β to balane the two loss term in (1). A sheme named infinite mixture of VAEs is proposed and applied in semisupervised generation [1]. It uses multiple number of VAEs and ombines them as a non-parametri mixture model. In [19], the semi-amortized VAE is proposed. It ombines AVI with SVI in VAE. Here the SVI estimates the distribution parameters on the whole training set, while the AVI in traditional VAE gives this estimation for a single input. GAN [13] is another tehnique to model the data distribution p D (x). It starts from a random z p(z), where p(z) is simple, e.g. Gaussian, and trains a transform network g θ (z) under the help of disriminator D φ ( ) so that p θ (z) approximates p D (x). The later works [32, 26, 2, 14, 29] try to stabilize GAN s training. Traditional GAN works in a fully supervised manner, while GAN [18, 33, 30, 6] aims to generate images onditioned on labels. In GAN, the label is given as an input to both the generator and disriminator as a ondition for the distribution. The enoder-deoder arhiteture like AE or VAE an also be used in GAN. In ALI [11] and BiGAN [10], the enoder maps x to z, while the deoder reverse it. The disriminator takes the pair of z and x, and is trained to determine whether it omes from the enoder or deoder in an adversarial manner. In VAE-GAN [22, 24], VAE s generated data are improved by a disriminator. Similar idea also applies to VAE in [3]. VAE-GAN also applies in some speifi appliations like [4, 12]. Sine ode z potentially affets the generated data, some works try to model its effet and disentangle the dimensions of z. InfoGAN [9] reveals the effet of latent spae ode by maximizing the mutual information between and the syntheti data g θ (z, ). Its generator outputs g θ (z, ) whih is inspeted by the disriminator D φ ( ). D φ ( ) also tries to reonstrut the ode. In [27], the latent dimension is disentangled in VAE based on the speified fators and unspeified ones, whih is similar with our work. But its enoder takes multiple inputs, and the deoder ombines odes from different inputs for reonstrution. The work in [15] mod-

3 Enoder S LGM zs μ Σ μ5 μ1 μ9 μ4 μ6 μ2 μ10 μ7 μ3 μ8 dimensions z s and the irrelevant dimensions z u, whih means z s fully reflets the lass but z u dose not. Thus the objetive an be rewritten as (derived in detail in Appendix). log p(x) = log p(x, z s, z u, )dz s dz u x Enoder U Lkl zu Deoder Adversarial Classifier LC adv LE adv x Lre Disriminator adv LGD adv LD E qψ (z s x),q φ (z u x)[log p θ (x z s, z u )] D KL (q φ (z u x) p(z u )) D KL (q ψ (z s, x) p(z s, )) (2) Figure 1. The network arhiteture. We disentangle lass relevant dimensions z s and lass irrelevant dimensions z u in the latent spae. The enoder s maps input image x to z s, and fores z s to be well lassified while following a Gaussian mixture distribution with learned mean µ and ovariane Σ. Meanwhile, the enoder u extrats z u from x and pushes it to math the standard Gaussian N (0, I). The adversarial lassifier is added on the top of z u to distinguish the lass of z u, while enoder u tries to fool it. Then z s and z u are onatenated and fed into the deoder to obtain x for reonstrution. The adversarial training in the pixel domain is also adopted with a disriminator added on the images. The forward pass proess is drawn in solid lines and dashed lines represent bak propagation. ifies [27] by taking a single input. To stabilize training, its model is built in AE not VAE, hene it an t perform variational inferene. Other works in [37, 4, 23] are also built in AE and more than two inputs. Moreover they only apply in a partiular domain like fae [37, 4] or image-to-image translation [23], while our work is built in VAE and takes only a single input for a more general ase. 3. Proposed method We propose a image generation algorithm based on VAE whih divides the enoder into two separate ones, one enoding label relevant representation z s and the other enoding label irrelevant information z u. z s is learned with supervision of the ategorial lass label and it is required to follow a Gaussian mixture distribution, while z u is wished to ontain other ommon information irrelevant to the label and is made lose to standard Gaussian N (0, I) Problem formulation Given a labeled dataset D s = {(x 1, y 1 ), (x 2, y 2 ),, (x (N), y (N) )}, where x (i) is the i-th images and y (i) {0, 1,, C 1} is the orresponding label. C and N are the number of lasses and the size of the dataset, respetively. The goal of VAE is to maximum the ELBO defined in (1), so that the data log-likelihood log p(x) is also maximized. The key idea is to split the full latent ode z into the label relevant In Eq. 2, the ELBO beomes 3 terms in our setting. The first term is the negative reonstrution error, where p θ is the deoder parameterized by θ. It measures whether the latent ode z s and z u is informative enough to reover the original data. In pratie, the reonstrution error L re an be defined as the l 2 loss between x and x. The seond term ats as a regularization term of label irrelevant branh that pushes q φ (z u x) to math the prior distribution p(z u ), whih is illustrated in detail in Setion 3.2. The third term mathes q ψ (z s x) to a lass-speifi Gaussian distribution whose mean and ovariane are learned with supervision, and it will be further introdued in Setion Label irrelevant branh Intuitively, we want to disentangle the latent ode z into z s and z u, and expet z u to follow a fixed, prior distribution whih is irrelevant to the label. This regularization is realized by minimizing KL divergene between q φ (z u x) and the prior p(z u ) as illustrated in Eq. 3. More speifially, q φ (z u x) is a Gaussian distribution whose mean µ and diagonal ovariane Σ are the output of enoder u parameterized by φ. p(z u ) is simply set to N(0, I). Hene the KL regularization term is: L kl = D KL [N (µ, Σ) N (0, I)] (3) Note that Eq. 3 an be represented in a losed form, whih is easy to be omputed. To ensure good disentanglement in z u and z s, we introdue adversarial learning in the latent spae as in AAE [25] to drive the label relevant information out of z u. To do this, an adversarial lassifier is added on the top of z, whih is trained to lassify the ategory of z with ross entropy loss as is shown in (4): L adv C = E qφ (z u x) I( = y) log q ω ( z u ) (4) where I( = y) is the indiator funtion, and q ω ( z u ) is softmax probability output by the adversarial lassifier parameterized by ω. Meanwhile, enoder u is trained to fool the lassifier, hene the target distribution beomes uniform

4 over all ategories, whih is 1 C. The ross entropy loss is defined as (5). L adv E = E qφ (z u x) 3.3. Label relevant branh 1 C log q ω( z u ) (5) Inspired by GM loss [44], we expet z s to follow a Gaussian mixture distribution, expressed in Eq. 6, where µ and Σ are the mean and ovariane of Gaussian distribution for lass, and p() is the prior probability, whih is simply set to 1 C for all ategories. For simpliity, we ignore the orrelation among different dimensions of z s, hene Σ is assumed to be diagonal. p(z s ) = p(z s )p() = N (z s ; µ, Σ )p() (6) Reall that in Eq. 2, the KL divergene between q ψ (z s, x) and p(z s, ) is minimized. If z s is formulated as a Gaussian distribution with its Σ 0 and its mean ẑ s output by enoder s, whih is atually a Dira delta funtion δ(z s ẑ s ), the KL divergene turns out to be the likelihood regularization term L lkd in Eq. 7, whih is proved in Appendix. Here µ y and Σ y are the mean and ovariane speified by the label y. L lkd = log N (ẑ s ; µ y, Σ y ) (7) Furthermore, we want z s to ontain label information as muh as possible, thus the mutual information between z s and lass is added to the maximization objetive funtion. We prove in Appendix that it s equal to minimize the rossentropy loss of the posterior probability q( z s ) and the label, whih is exatly the lassifiation loss L ls in GM loss as is shown in Eq. 8. L ls = E qψ (z s x) I( = y) log q( z s ) = log N (ẑ s µ y, Σ y )p(y) k N (ẑ s µ k, Σ k )p(k) These two terms are added up to form GM loss in Eq. 9. Here L GM is finally used to train the enoder s. (8) L GM = L ls + λ lkd L lkd (9) 3.4. The deoder and the adversarial disriminator The latent odes z s and z u output by enoder s and enoder u are first onatenated together, and then further given to the deoder to reonstrut the input x by x. Here the deoder is indiated by p θ (x z) with its parameter θ learned from the l 2 reonstrution error L re. To synthesize a high quality x, we also employ the adversarial training in the pixel domain. Speifially, a disriminator D θd (x, ) with adversarial training on its parameter θ d is used to improve x. Here the label is utilized in D θd like in [30]. The adversarial training loss for disriminator an be formulated as in Eq. 10, L adv D = E x Pr [log D θd (x, )] E zu N (0,I),z s p(z s)[log(1 D θd (G(z s, z u ), ))] (10) while this loss beomes L adv GD = E zu N (0,I),z s p(z s)[log(d θd (G(z s, z u ), ))] for the generator. Note that here G(z s, z u ) is the deoder and p(z s ) is defined in Eq Training algorithm The training detail is illustrated in Algorithm 1. The Enoder s, modeled by q ψ, extrats label relevant ode z s. Enoder s is trained with L GM and L re, enouraging z s to be label dependent and follow a learned Gaussian mixture distribution. Meanwhile, the Enoder u represented by q φ is intended to extrat lass irrelevant ode z u. It s trained by L kl, L adv E and L re to make z u irrelevant to the label and be lose to N (0, I). The adversarial lassifier parameterized by ω is learned to lassify z u using L adv C. Then the deoder p θ generates reonstrution image using the ombined feature of z s and z u with the loss L re. In the training proess, a 2-stage alternating training algorithm is adopted. First, enoder s is updated using L GM to learn mean µ and ovariane Σ of the prior p(z s ). Then, the two enoders and the deoder are trained jointly to reonstrut images while the distributions of z s and z u are onsidered Appliation in semi-supervised generation Given L unlabelled extra data D u = {x (N+1), x (N+2),, x (N+L) }, we now use our arhiteture for the semi-supervised generation, in whih the labels y (N+i) of x (N+i) in D u are not presented. Here we hold the assumption that D u are in the same domain as the fully supervised D s, but y (N+i) an be satisfied y (N+i) {0, 1,, C 1}, or out of the predefined range. In other words, if the absent y (N+i) is in the predefined range, its z s follows the same Gaussian mixture distribution as in Eq. 6. Otherwise, z s should follow an ambiguous Gaussian distribution defined in Eq. 11. µ t = σ 2 t = p()µ p()σ 2 + p()(µ ) 2 ( p()µ ) 2 (11)

5 Algorithm 1 The training proess of our proposed arhiteture. Require: ψ, φ, θ, ω, θ d initial parameters of Enoder s, Enoder u, Deoder, the adversarial lassifier on z u and the disriminator on x; µ and Σ initial mean and ovariane for Gaussian distribution of z s ; n gm, the number of iterations of L GM per end-to-end iteration; λ re and λ kl the weight of L re and L kl ; 1: while not onverged do 2: for i = 0 to n gm do 3: Sample {x, y} a bath from dataset. 4: ẑ s enoder s (x). 5: L GM log q(y ẑ s ) λ lkd log p(ẑ s y) 6: ψ + ψ L GM 7: µ + µ L GM, [0, C 1] 8: + Σ Σ L GM, [0, C 1] 9: end for 10: Sample {x, y} a bath from dataset. 11: µ, Σ Enoder u (x) 12: L kl D KL [N (µ, Σ) N (0, I)] 13: Sample ɛ N (0, I) 14: z u Σ 1 2 ɛ + µ 15: L adv E 1 C log q ω( z u ) 16: L adv C log q ω(y z u ) 17: ẑ s Enoder s (x). 18: L lkd log N (ẑ s ; µ y, Σ y ) 19: x f Deoder(ẑ s, z u ) 20: L re 1 2 x x f : Sample z p s p(z s y), z p u N (0, I) 22: x p Deoder(z p s, z p u) log D θ d (x, y) log(1 D θd (x f, y)) log(1 D θd (x p, y)) 24: L adv GD log(d θ d (x f, y)) log(d θ d (x p, y)) 23: L adv D 25: ψ + ψ (L re + λ lkd L lkd ) 26: φ + φ (L adv E + λ kll kl + λ re L re ) 27: ω + ω L adv C 28: θ + θ (L re + L adv GD ) + 29: θ d θd L adv D 30: end while More speifially, z s is expeted to follow N (µ t, Σ t ) where µ t and Σ t are the total mean and ovariane of all the lass-speifi Gaussian distributions N (µ, Σ ) as illustrated in Eq. 6. Here, Σ t is diagonal matrix with σt 2 as its variane vetor. σ 2 is also the variane vetor of Σ. Hene, the likelihood regularization term beomes L lkd = log N (ẑ s ; µ t, Σ t ). The whole network is trained in a end-to-end manner using total losses. Note that in this setting, the label y is not provided, so L GM, L adv E and Ladv C are ignored in the training proess. 4. Experiments In this setion, experiments are arried out to validate the effetiveness of the proposed method. A toy example is first designed to show that by disentangling the label relevant and irrelevant odes, our model has the ability of generating diverse data samples than VAE-GAN [3]. We then ompare the quality of generated images on real image datasets. The latent spae is also analyzed. Finally, the experiments of semi-supervised generation and image inpainting show the flexibility of our model, hene it may has many potential appliations Toy examples This setion demonstrates our method on a toy example, in whih the real data distribution lies in 2D with one dimension (x axis) being label relevant and the other (y axis) being irrelevant. The distribution is assumed to be known. There are 3 types of data points indiated by green, red and blue, belonging to 3 lasses. The 2D data points and their orresponding labels are given to our model for variational inferene and the new sample generation. For omparison, we also give the same training data to VAE-GAN for the same purpose. The two ompared models share the similar settings of the network. In our model, the two enoders are both MLP with 3 hidden layers, and there are 32, 64, and 64 units in them. In VAE-GAN, the enoder is the same, but it only has one enoder.the disriminators are exatly the same, whih is also an MLP of 3 hidden layers with 32, 64, and 64 units. Adam is used as the optimization method in whih a fixed learning rate of is applied for both. Eah model is trained for 50 epohs until they all onverge. The generated samples of eah model is plotted in Figure 2. From Figure 2 we an observe that both two models an apture the underlying data distribution, and our model onverges at the similar rate. The advantage of our model is that it tend to generate diverse samples, while VAE-GAN generates samples in a onserving way in whih the label irrelevant dimensions are within the limited value range Analysis on generated image quality In this setion, we ompare our method with other generative models for image generation quality. The experiments are onduted on two datasets: FaeSrub [31] and CIFAR- 10 [21]. The FaeSrub ontains 92k training images from 530 different identities. For FaeSrub, a asaded objet detetor proposed in [42] is first used to detet faes first, and then the fae alignment is also onduted based on SDM proposed in [46]. The deteted ropped faes are resized to the fixed size In the training proess, Adam optimizer with α = is used. The hyper parameter λ lkd, λ kl, and λ re are set to 0.1, 10 N pixel, and 1 N zu,

6 initial 10 epohes 20 epohes 30 epohes 40 epohes 50 epohes CVAE-GAN ours (a) real data distribution (b) generated distribution Figure 2. Results on a toy example for our model and VAE-GAN. We show the generated points at different epohs. (b) VAE () GAN (d) CVAE-GAN (e) ours CIFAR-10 FaeSrub (a) real samples Figure 3. Visualization of generated images of different models. respetively. Here, Npixel is the number of image pixels, and Nzu is the dimension of zu. Sine our method inorporates the label for training, popular generative networks onditioned on label, like VAE [39], VAE-GAN [3], and GAN [30], are hosen for omparison. For VAE, VAEGAN and GAN, we randomly generate samples of lass by first sampling z N (0, I) and then onatenating z and one hot vetor of as the input of deoder/generator. As for ours, zs N (µ, σ ) and zu N (0, I) are sampled and ombined for deoder to generate samples. Some of generated images are visualized in Figure 9. It shows that samples generated by VAE are highly blurred, and GAN suffers from mode ollapse. Samples generated by VAEGAN and our method seem to have similar quality, we refer to two metris,ineption Sore [36] and intra-lass diversity [5] to ompare them. We adopt Ineption Sore to evaluate realism and interlass diversity of images. Generated images that are lose to real images of lass y should have a posterior proba- bility p(y x) with low entropy. Meanwhile, images of diverse lasses should have a marginal probability p(y) with high entropy. Hene, Ineption Sore, formulated as exp(ex KL(p(y x) p(y))), gets a high value when images are realisti and diverse. To get onditional lass probability p(y x), we first train a lassifier with Ineption-ResNet-v1 [40] arhiteture on real data. Then we randomly generate 53k samples(100 for eah lass) of FaeSrub and 5k samples (500 for eah lass) of CIFAR-10, and apply them to the pre-trained lassifier. The marginal p(y) is obtained by averaging all p(y x). The results are listed in Table 4. We emphasize that our method will generate more diverse samples in one lass. Sine Ineption Sore only measures inter-lass diversity, intra-lass diversity of samples should also be taken into aount. We adopt the metri proposed in [5], whih measures the average negative MSSSIM [45] between all pairs in the generated image set X. Table 2 shows the inter-lass diversity of VAE-GAN and

7 FaeSrub CIFAR-10 VAE [39] GAN [30] VAE-GAN [3] ours Table 1. Ineption Sore of different methods on two datasets. Please refer to 4.2 for more details. with linearly transformed z u a fixed z s of lass,and eah olumn shares a same z u. We an see that images from eah row hange onsistently with poses, expressions and illuminations. These two experiments suggest that z s is relevant to, while z u reflets more ommon label irrelevant harateristis. FaeSrub CIFAR-10 VAE-GAN [3] ours Table 2. Intra-lass diversity of different methods on two datasets. Please refer to 4.2 for more details. (a) Fixing z u and varying z s. our method on FaeSrub and CIFAR-10. d intra (X) = 1 1 X 2 (x,x) X X MS SSIM(x, x) (12) 4.3. Analysis on disentangled latent spae We now evaluate our proposal on the disentangled latent spae, whih is represented by label relevant dimensions z s and irrelevant ones z u. z s for lass is supposed to apture the variation unique to training images within the label, while z u should ontain the variation in ommon harateristis for all lasses. It s validated in the following ways: (1) fixing z u and varying z s. In this setting, we diretly sample a z u N (0, I), and keep it fixed. Then a set of z s for lass is obtained by first getting a series of random odes sampled from N (0, I) and then mapping them to lass. In speifi, we first sample z 1 N (0, I) and z 2 N (0, I). Then a set of random odes z (i) are obtained by linear interpolation, i.e., z (i) = αz 1 + (1 α)z 2, α [0, 1]. We = z (i) σ + µ. Finally eah z (i) s is onatenated with the fixed z u and given to the deoder to get a generated image. (2) fixing z s and varying z u. Similar to (1), we first sample a z s N (µ, σ ) from a learned distribution and keep it fixed. Then a set of label irrelevant z u are obtained by linearly interpolating between z 1 and z 2, where z 1 and z 2 are sampled from N (0, I). We ondut experiments on FaeSrub and the generated images are shown in Figure 4. In Figure 4 (a), eah row presents samples generated by linearly transformed z s of a ertain lass and a fixed z u. All three rows share the same z u, and eah olumn shares the same random ode z (i) and map eah z (i) to lass with z (i) s just maps it to different lass with z (i) s = z (i) σ + µ. It shows that as z s varies, one may hange differently for different identities, e.g., grow a beard, wrinkle, or take off the make-up. In Figure 4 (b), eah row presents samples (b) Fixing z s and varying z u. Figure 4. The generated images by fixing one ode and varying the other. In (a), eah row shows samples for linearly transformed z s of a ertain lass with a fixed z u. In (b), eah row orresponds to samples for linearly transformed z u with a fixed z s of lass. We are also interested in eah dimension in z u and ondut an experiment by varying a single element in it. We find three dimensions in z u whih reflet the meaningful the ommon harateristis, suh as the expression, elevation and azimuth Semi-supervised image generation Aording to the details in Setion 3.6, the experiments on semi-supervised image generation are onduted. We find our method an learn well disentangled latent representation when the unlabelled extra data are available. To validate that, we randomly selet 200 identities of about 21k images from CASIA [47] dataset and remove their labels to form unlabelled dataset D u. Note that the identities in D u are totally different with those in FaeSrub. After training the whole network on labelled dataset D s, we finetune it on D u using the training algorithm illustrated in Setion 3.6. To demonstrate the semi-supervised generation results, two different images are given to Enoder S and Enoder U to generate the ode z s and z u, respetively. Then, the deoder is required to synthesis a new image based on the onatenated ode from z s and z u. The Figure 6 shows fae synthesis results using images whose identities have not appeared in D s. The first row and first olumn show a set of original images providing z u and z s respetively,

8 (a) expression (b) elevation () azimuth Figure 5. The generated images by fixing z s for eah row and varying single dimensions in z u. Here, we find three different dimensions in z u, whih diretly auses the variations on expressions in (a), eluviation in (b), and azimuth in (). Figure 6. Fae synthesis using images whose identities have not appeared in D s. Original images providing z u and z s are given in the first row and the first olumn. The synthesizing images using the ombination of z u z s are shown in the orresponding position. while images in the middle are generated ones using z s of the orresponding row and z u of the orresponding olumn. It is obvious that the identity depends on z s, while other harateristis like the pose, illumination, expressions are refleted on z u. This semi-supervised generation shows z s and z u an also be disentangled on identities outside the labeled training data D s, whih provides the great flexibility for image generation Image inpainting eyes), while our method provides visually pleasing results. original image orrupted image CVAE-GAN ours original image orrupted image CVAE-GAN ours original image orrupted image C (a) right-half fae (b) eyes () nose and m Our method an also be applied to image inpainting. It means that given a partly orrupted image, we an extrat meaningful latent ode to reonstrut the original image. Note that in VAE-GAN [3], an extra lass label should be provided for reonstrution while it s needless in our method. In pratie, we first orrupt some pathes for a image x, namely right-half, eyes, nose and mouth, and bottom-half regions, then input those orrupted images into the two enoders to get z s and z u, then the reonstruted image x is generated using a ombined z s and z u. The image inpainting result is obtained by x inp = M x + (1 M) x, where M is the binary mask for the orrupted path. Figure 7 shows the results of image inpainting. VAE-GAN struggles to omplete the images when it omes to a large part of missing regions (e.g. righthalf and bottom-half parts) or pivotal regions of faes (e.g. original image orrupted image CVAE-GAN ours original image orrupted image CVAE-GAN ours original image orrupted image CVAE-GAN ours (a) right-half fae (b) eyes () nose and mouth original image orrupted image CVAE-GAN (d) bottom-half fae Figure 7. Image inpainting results. The original image is orrupted with different patterns, on the right-half, eyes, nose and mouth, and bottom-half fae. We ompare our model with VAE-GAN. 5. Conlusion We propose a latent spae disentangling algorithm on VAE baseline. Our model learns two separated enoders ours

9 and divides the latent ode into label relevant and irrelevant dimensions. Together with a disriminator in pixel domain, we show that our model an generate high quality and diverse images, and it an also be applied in semisupervised image generation in whih unlabeled data with unseen lasses are given to the enoders. Future researh inludes building more interpretable latent dimensions with help of more labels, and reduing the orrelation between the label relevant and irrelevant odes in our framework.

10 Referenes [1] M. E. Abbasnejad, A. Dik, and A. van den Hengel. Infinite variational autoenoder for semi-supervised learning. In 2017 IEEE Conferene on Computer Vision and Pattern Reognition (CVPR), pages IEEE, [2] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. stat, 1050:9, [3] J. Bao, D. Chen, F. Wen, H. Li, and G. Hua. Cvae-gan: Fine-grained image generation through asymmetri training. In 2017 IEEE International Conferene on Computer Vision (ICCV), pages IEEE, [4] J. Bao, D. Chen, F. Wen, H. Li, and G. Hua. Towards openset identity preserving fae synthesis. In Proeedings of the IEEE Conferene on Computer Vision and Pattern Reognition, pages , [5] M. Ben-Yosef and D. Weinshall. Gaussian mixture generative adversarial networks for diverse datasets, and the unsupervised lustering of images. arxiv preprint arxiv: , [6] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. In The IEEE Conferene on Computer Vision and Pattern Reognition (CVPR), volume 1, page 7, [7] C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerhner. Understanding disentangling in β-vae. arxiv preprint arxiv: , [8] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggfae2: A dataset for reognising faes aross pose and age. In Automati Fae & Gesture Reognition (FG 2018), th IEEE International Conferene on, pages IEEE, [9] X. Chen, Y. Duan, R. Houthooft, J. Shulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advanes in neural information proessing systems, pages , [10] J. Donahue, P. Krähenbühl, and T. Darrell. Adversarial feature learning. arxiv preprint arxiv: , [11] V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville. Adversarially learned inferene. In ICLR, [12] Y. Ge, Z. Li, H. Zhao, G. Yin, X. Wang, and H. Li. Fdgan: Pose-guided feature distilling gan for robust person reidentifiation. In Advanes in Neural Information Proessing Systems, [13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advanes in neural information proessing systems, pages , [14] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In Advanes in Neural Information Proessing Systems, pages , [15] N. Hadad, L. Wolf, and M. Shahar. A two-step disentanglement method. In Proeedings of the IEEE Conferene on Computer Vision and Pattern Reognition, pages , [16] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinik, S. Mohamed, and A. Lerhner. β-vae: Learning basi visual onepts with a onstrained variational framework. In ICLR, [17] M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stohasti variational inferene. The Journal of Mahine Learning Researh, 14(1): , [18] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Imageto-image translation with onditional adversarial networks. arxiv preprint, [19] Y. Kim, S. Wiseman, A. C. Miller, D. Sontag, and A. M. Rush. Semi-amortized variational autoenoders. arxiv preprint arxiv: , [20] D. P. Kingma and M. Welling. Auto-enoding variational bayes. stat, 1050:1, [21] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Tehnial report, Citeseer, [22] A. B. L. Larsen, S. K. Sønderby, H. Larohelle, and O. Winther. Autoenoding beyond pixels using a learned similarity metri. In International Conferene on Mahine Learning, pages , [23] H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. K. Singh, and M.- H. Yang. Diverse image-to-image translation via disentangled representations. In European Conferene on Computer Vision, [24] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-toimage translation networks. In Advanes in Neural Information Proessing Systems, pages , [25] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoenoders. arxiv preprint arxiv: , [26] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley. Least squares generative adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conferene on, pages IEEE, [27] M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprehmann, and Y. LeCun. Disentangling fators of variation in deep representation using adversarial training. In Advanes in Neural Information Proessing Systems, pages , [28] L. Mesheder, S. Nowozin, and A. Geiger. Adversarial variational bayes: Unifying variational autoenoders and generative adversarial networks. In International Conferene on Mahine Learning (ICML), pages PMLR, [29] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spetral normalization for generative adversarial networks. In ICLR, [30] T. Miyato and M. Koyama. gans with projetion disriminator. arxiv preprint arxiv: , [31] H.-W. Ng and S. Winkler. A data-driven approah to leaning large fae datasets. In Image Proessing (ICIP), 2014 IEEE International Conferene on, pages IEEE, [32] S. Nowozin, B. Cseke, and R. Tomioka. f-gan: Training generative neural samplers using variational divergene min-

11 imization. In Advanes in Neural Information Proessing Systems, pages , [33] A. Odena, C. Olah, and J. Shlens. Conditional image synthesis with auxiliary lassifier gans. In International Conferene on Mahine Learning, pages , [34] D. J. Rezende, S. Mohamed, and D. Wierstra. Stohasti bakpropagation and approximate inferene in deep generative models. In International Conferene on Mahine Learning, pages , [35] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Sale Visual Reognition Challenge. International Journal of Computer Vision (IJCV), 115(3): , [36] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved tehniques for training gans. In Advanes in Neural Information Proessing Systems, pages , [37] Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shehtman, and D. Samaras. Neural fae editing with intrinsi image disentangling. In Computer Vision and Pattern Reognition (CVPR), 2017 IEEE Conferene on, pages IEEE, [38] K. Simonyan and A. Zisserman. Very deep onvolutional networks for large-sale image reognition. arxiv preprint arxiv: , [39] K. Sohn, H. Lee, and X. Yan. Learning strutured output representation using deep onditional generative models. In Advanes in Neural Information Proessing Systems, pages , [40] C. Szegedy, S. Ioffe, V. Vanhouke, and A. A. Alemi. Ineption-v4, ineption-resnet and the impat of residual onnetions on learning. In AAAI, volume 4, page 12, [41] C. Szegedy, V. Vanhouke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the ineption arhiteture for omputer vision. In Proeedings of the IEEE onferene on omputer vision and pattern reognition, pages , [42] P. Viola and M. Jones. Rapid objet detetion using a boosted asade of simple features. In Computer Vision and Pattern Reognition, CVPR Proeedings of the 2001 IEEE Computer Soiety Conferene on, volume 1, pages I I. IEEE, [43] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Calteh-UCSD Birds Dataset. Tehnial Report CNS-TR , California Institute of Tehnology, [44] W. Wan, Y. Zhong, T. Li, and J. Chen. Rethinking feature distribution for loss funtions in image lassifiation. In Proeedings of the IEEE Conferene on Computer Vision and Pattern Reognition, pages , [45] Z. Wang, E. P. Simonelli, and A. C. Bovik. Multisale strutural similarity for image quality assessment. In The Thrity- Seventh Asilomar Conferene on Signals, Systems & Computers, 2003, volume 2, pages Ieee, [46] X. Xiong and F. De la Torre. Supervised desent method and its appliations to fae alignment. In Proeedings of the IEEE onferene on omputer vision and pattern reognition, pages , [47] D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning fae representation from srath. arxiv preprint arxiv: , 2014.

12 Appendies A. Mathematial Proofs A.1. The ELBO of the log-likelihood objetive. We delare in Equation 2 that after dividing the latent spae into label relevant diemnsions z s and label irrelevant diemnsions z u, the ELBO of the log-likelihood objetive log p(x) beomes 3 terms in our setting. log p(x) = log p(x, z s, z u )dz s dz u E qψ (z s x),q φ (z u x)[log p θ (x z s, z u )] D KL (q φ (z u x) p(z u )) D KL (q ψ (z s, x) p(z s, )) Proof Our generative proess is desribed as follows. First, sample a label relevant ode z s p(z s, ) and a label irrelevant ode z u p(z u ). Then, a deoder p θ (x z s, z u ), taking the ombination of z s and z u as input, maps latent odes to images. Hene, we fatorize the joint distribution p(x, z s, z u ) as: p(x, z s, z u ) = p θ (x z s, z u )p(z s, )p(z u ) By using Jensens inequality, the log-likelihood log p(x) an be written as: log p(x) = log p(x, z s, z u )dz s dz u = log p θ (x z s, z u )p(z s, )p(z u )dz s dz u = log E qψ (z s, x),q φ (z u x) p θ (x z s, z u )p(z u )p(z s, ) q ψ (z s, x)q φ (z u x) E p θ(x z s, z u )p(z u )p(z s, ) qψ (z s, x),q φ (z u x)[log ] q ψ (z s, x)q φ (z u x) = E qψ (z s, x),q φ (z u x)[log p θ (x z s, z u )] + E qψ (z s, x),q φ (z u x)[log p(z u) q φ (z u x) ] + E qψ (z s, x),q φ (z u x)[log p(z s, ) q ψ (z s, x) ] = E qψ (z s x),q φ (z u x)[log p θ (x z s, z u )] D KL (q φ (z u x) p(z u )) D KL (q ψ (z s, x) p(z s, )) A.2. Log-likelihood regularization term in the label relevant branh Note that the KL divergene D KL (q ψ (z s, x) p(z s, )), the third term of the ELBO in Equation 2, is minimized. If we assume onditional independene between z s and the lass, then we have q ψ (z s, x) = q ψ (z s x)p( x) where p( x) is the one-hot enoding of the label y. If q ψ (z s x) is formulated as Gaussian distribution with Σ 0 and mean ẑ s output by enoder s, whih is atually a Dira delta funtion. q ψ (z s x) = δ(z s ẑ s ) The KL regularization term beomes D KL (q ψ (z s, x) p(z s, )) =D KL [δ(z s ẑ s )p( x) p(z s )p()] = p(z s )p() δ(z s ẑ s )p( x) log δ(z s ẑ s )p( x) dz s = δ(z s ẑ s )p( x) log p(z s )dz s + δ(z s ẑ s )p( x) log p() p( x) dz s δ(z s ẑ s )p( x) log δ(z s ẑ s )dz s = p( x) log p(ẑ s ) p( x) log p() p( x) + δ(z s ẑ s ) log δ(z s ẑ s )dz s The seond term relates to the prior distribution, so it an be regraded as a onstant. The third term is negative entropy of delta funtion and has nothing to do with ẑ s, hene we onsider it as a onstant too. Therefore, we have D KL (q ψ (z s, x) p(z s, )) = = p( x) log p(ẑ s ) + Const. I( = y) log p(ẑ s ) + Const. where the prior distribution p(ẑ s ) is set to N(ẑ s ; µ, Σ ). Ignoring the onstant term, it turns out to be the likelihood regularization term L lkd in Equation 7. L lkd = I( = y) log N(ẑ s ; µ, Σ ) = log N(ẑ s ; µ y, Σ y ) A.3. Cross-entropy objetive in the label relevant branh To enourage z s to beome label relevant as muh as possible, the mutual information I(z s ; ) is maximized,

13 where z s q ψ (z s x). In pratie, I(z s ; ) is hard to optimize diretly beause it requires aess to p( z s ). We an instead optimize its lower bound by introduing an auxiliary distribution q( z s ) to approximate p( z s ) as in info- GAN [9]. I(z s ; ) = H() H( z s ) = H() + E p(x) E qψ (z s x)e p( zs) log p( z s ) = H() + E p(x) E qψ (z s x)e p( zs) log p( z s) q( z s ) + E p(x) E qψ (z s x)e p( zs) log q( z s ) H() + E p(x) E qψ (z s x)e p( zs) log q( z s ) Sine we still need to sample from p( z s ) in the inner expetation, we adopt Lemma 5.1 in infogan to further remove the need of p( z s ). The first term of the lower bound is a onstant, so we ignore it. Then the seond term beomes E p(x) E qψ (z s x)e p( zs) log q( z s ) =E p( )E p(x )E qψ (z s x)e p( zs) log q( z s ) = p( )p(x )q ψ (z s x)p( z s ) log q( z s )dxdz s = p( )p( z s ) log q( z s )[ p(x )q ψ (z s x)dx]dz s We hold the assumption that the proess of sampling z s x is independent on, thus p(x )q ψ (z s x)dx = p(z s, x )dx = p(z s ) Aording to Lemma 5.1 in infogan, we have p( )p(z s )p( z s ) log q( z s )dz s = Hene p( )p(z s ) log q( z s )dz s p( )p( z s ) log q( z s )[ = p( )p(z s )p( z s ) log q( z s )dz s = p()p(z s ) log q( z s )dz s p(x )q ψ (z s x)dx]dz s We further fatorize p(z s ) as p(x )q ψ (z s x)dx, the equation above beomes p()p(z s ) log q( z s )dz s = p()p(x )q ψ (z s x) log q( z s )dz s dx = p(x)q ψ (z s x)p( x) log q( z s )dz s dx =E p(x) E qψ (z s x) p( x) log q( z s ) where p( x) is the one-hot enoding of the label y, i.e. p( x) = I( = y). To maximize it is to minimize its opposite, whih is exatly the lassifiation loss in Setion 3.3. L ls = E p(x) E qψ (z s x) I( = y) log q( z s ) B. Experimental Details B.1. Dataset Synthesis of Toy Example Our syntheti dataset of toy example is a modifiation of the two-moon dataset, whih ontains three half irles instead of two. The generative proess is desribed as follows. First, sample data points from three half unit irles with a horizontal interval of 2.2. Then, add Gaussian noises with std = 0.15 to all of them. B.2. Network Arhiteture of FaeSrub For the two enoders, Enoder s and Enoder u, we use VGG [38] arhiteture with bath normalization layers added to eah layer and replae the last three f layers with two f layers of 1024 and 512 units. For the deoder, an inverse struture of the enoders is applied. The adversarial lassifier in Setion 3.2 onsists of two f layers of 256 and 530 units, and the disriminator ontains 7 onvolution layers and two f layers (details are shown in Table 4). Note that spetral normalization [29] is applied to to the all of the weights in the disriminator and the label embedding is inorporated in the first f layer as in [30]. B.3. Network Arhiteture of Cifar-10 The network strutures of the two enoders, deoder and disriminator for Cifar-10 are shown in Table 3. The adversarial lassifier in the latent spae is similar as that used for FaeSrub, whih are two f layers of 256 and 10 units. Also, spetral normalization and label embedding are applied in the disriminator. B.4. Optimization We use Adam optimizer with α = , β 1 = 0 and β 2 = 0.9. Sine in the training proess, the first stage using

14 Enoder Deoder Disriminator input x R input z s R 100, z u R 200 input x R onv, 32, stride 2, bathnorm, relu onat 5 5 onv, 32, stride 1, lrelu 5 5 onv, 64, stride 2, bathnorm, relu f, 1024, bathnorm, relu 5 5 onv, 128, stride 2, lrelu 3 3 onv, 128, stride 2, bathnorm, relu 5 5 onv, 256, stride 2, bathnorm, relu 5 5 onv, 256, stride 2, lrelu 3 3 onv, 256, stride 2, bathnorm, relu 5 5 onv, 256, stride 1, bathnorm, relu 5 5 onv, 256, stride 2, lrelu f, 1024, bathnorm, relu 5 5 onv, 128, stride 2, bathnorm, relu f, 512, lrelu f, 100 (for z s ) / 200 (for z u ) 5 5 onv, 64, stride 2, bathnorm, relu f, onv, 32, stride 2, bathnorm, relu 5 5 onv, 3, stride 1, tanh Table 3. The network struture for Cifar-10. Disriminator for FaeSrub input x R onv, 64, stride 2, lrelu 3 3 onv, 128, stride 2, lrelu 3 3 onv, 256, stride 1, lrelu 3 3 onv, 256, stride 2, lrelu 3 3 onv, 512, stride 1, lrelu 3 3 onv, 512, stride 2, lrelu 3 3 onv, 512, stride 2, lrelu global average pooling f, 1024, lrelu f, 1 Table 4. The network struture of disriminator for FaeSrub. L GM is trained 3 times per seond stage iteration, L GM onverges fast. Continuously training after it onverges will ause unstability of L GM beause Σ goes down gradually. In pratie, we deay the learning rate of Σ by 0.01 after 2 epohes. B.5. Ineption Sore Reall that Ineption Sore requires aess to the onditional lass probability p(y x). We use lassifiation model of Ineption-ResNet-v1 [40] arhiteture trained on VGGFae2 [8] to evaluate generative models trained on FaeSrub. For generative models trained on Cifar-10, lassifiation model of Ineption-v3 [41] arhiteture trained on ImageNet [35] is used. C.2. Additional Experiment on CUB and Cifar-100 We additionally apply our method to CUB [43] and Cifar-100 [21] dataset. The CUB ontains 200 ategories of birds with 11,788 images in total. For CUB , we rop the images aording to the bounding boxes provided by the dataset and resize the ropped images to The network struture is just same as it used in FaeSrub. For Cifar-100, we use the same network as in Cifar-10. Generated images are shown in Figure 9. Results of Ineption Sore and intra-lass diversity are listed in Table 5 and Table 6, respetively. CUB Cifar-100 VAE [39] GAN [30] VAE-GAN [3] ours Table 5. Ineption Sore of different methods. CUB Cifar-100 VAE-GAN [3] ours Table 6. Intra-lass diversity of different methods. C. Additional Experiment Results C.1. More generated samples on FaeSrub and Cifar-10 Figure 8 shows generated samples of our method on FaeSrub and Cifar-10 with eah row orresponding to a ertain lass.

Progress on Generative Adversarial Networks

Progress on Generative Adversarial Networks Progress on Generative Adversarial Networks Wangmeng Zuo Vision Perception and Cognition Centre Harbin Institute of Technology Content Image generation: problem formulation Three issues about GAN Discriminate

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging 0-708: Probabilisti Graphial Models 0-708, Spring 204 Disrete sequential models and CRFs Leturer: Eri P. Xing Sribes: Pankesh Bamotra, Xuanhong Li Case Study: Supervised Part-of-Speeh Tagging The supervised

More information

arxiv: v1 [cs.cv] 7 Mar 2018

arxiv: v1 [cs.cv] 7 Mar 2018 Accepted as a conference paper at the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) 2018 Inferencing Based on Unsupervised Learning of Disentangled

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

arxiv: v1 [cs.cv] 5 Jul 2017

arxiv: v1 [cs.cv] 5 Jul 2017 AlignGAN: Learning to Align Cross- Images with Conditional Generative Adversarial Networks Xudong Mao Department of Computer Science City University of Hong Kong xudonmao@gmail.com Qing Li Department of

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION Cuiui Kang 1, Shengai Liao, Shiming Xiang 1, Chunhong Pan 1 1 National Laboratory of Pattern Reognition, Institute of Automation, Chinese

More information

Adversarially Learned Inference

Adversarially Learned Inference Institut des algorithmes d apprentissage de Montréal Adversarially Learned Inference Aaron Courville CIFAR Fellow Université de Montréal Joint work with: Vincent Dumoulin, Ishmael Belghazi, Olivier Mastropietro,

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

A {k, n}-secret Sharing Scheme for Color Images

A {k, n}-secret Sharing Scheme for Color Images A {k, n}-seret Sharing Sheme for Color Images Rastislav Luka, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos The Edward S. Rogers Sr. Dept. of Eletrial and Computer Engineering, University

More information

arxiv: v1 [cs.cv] 26 Nov 2018

arxiv: v1 [cs.cv] 26 Nov 2018 Collaging on Internal Representations: An Intuitive Approah for Semanti Transfiguration Ryohei Suzuki 1,2 Masanori Koyama 2 Takeru Miyato 2 Taizan Yonetsuji 2 1 The University of Tokyo 2 Preferred Networks,

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

Implicit generative models: dual vs. primal approaches

Implicit generative models: dual vs. primal approaches Implicit generative models: dual vs. primal approaches Ilya Tolstikhin MPI for Intelligent Systems ilya@tue.mpg.de Machine Learning Summer School 2017 Tübingen, Germany Contents 1. Unsupervised generative

More information

Learning Discriminative and Shareable Features. Scene Classificsion

Learning Discriminative and Shareable Features. Scene Classificsion Learning Disriminative and Shareable Features for Sene Classifiation Zhen Zuo, Gang Wang, Bing Shuai, Lifan Zhao, Qingxiong Yang, and Xudong Jiang Nanyang Tehnologial University, Singapore, Advaned Digital

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

Visual Recommender System with Adversarial Generator-Encoder Networks

Visual Recommender System with Adversarial Generator-Encoder Networks Visual Recommender System with Adversarial Generator-Encoder Networks Bowen Yao Stanford University 450 Serra Mall, Stanford, CA 94305 boweny@stanford.edu Yilin Chen Stanford University 450 Serra Mall

More information

Introduction to Generative Adversarial Networks

Introduction to Generative Adversarial Networks Introduction to Generative Adversarial Networks Luke de Oliveira Vai Technologies Lawrence Berkeley National Laboratory @lukede0 @lukedeo lukedeo@vaitech.io https://ldo.io 1 Outline Why Generative Modeling?

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

Unsupervised Learning

Unsupervised Learning Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy

More information

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks Report for Undergraduate Project - CS396A Vinayak Tantia (Roll No: 14805) Guide: Prof Gaurav Sharma CSE, IIT Kanpur, India

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

Evolutionary Feature Synthesis for Image Databases

Evolutionary Feature Synthesis for Image Databases Evolutionary Feature Synthesis for Image Databases Anlei Dong, Bir Bhanu, Yingqiang Lin Center for Researh in Intelligent Systems University of California, Riverside, California 92521, USA {adong, bhanu,

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17 Improving Generative Adversarial Networks with Denoising Feature Matching David Warde-Farley 1 Yoshua Bengio 1 1 University of Montreal, ICLR,2017 Presenter: Bargav Jayaraman Outline 1 Introduction 2 Background

More information

Inverting The Generator Of A Generative Adversarial Network

Inverting The Generator Of A Generative Adversarial Network 1 Inverting The Generator Of A Generative Adversarial Network Antonia Creswell and Anil A Bharath, Imperial College London arxiv:1802.05701v1 [cs.cv] 15 Feb 2018 Abstract Generative adversarial networks

More information

SYNTHESIS OF IMAGES BY TWO-STAGE GENERATIVE ADVERSARIAL NETWORKS. Qiang Huang, Philip J.B. Jackson, Mark D. Plumbley, Wenwu Wang

SYNTHESIS OF IMAGES BY TWO-STAGE GENERATIVE ADVERSARIAL NETWORKS. Qiang Huang, Philip J.B. Jackson, Mark D. Plumbley, Wenwu Wang SYNTHESIS OF IMAGES BY TWO-STAGE GENERATIVE ADVERSARIAL NETWORKS Qiang Huang, Philip J.B. Jackson, Mark D. Plumbley, Wenwu Wang Centre for Vision, Speech and Signal Processing University of Surrey, Guildford,

More information

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules Improved Vehile Classifiation in Long Traffi Video by Cooperating Traker and Classifier Modules Brendan Morris and Mohan Trivedi University of California, San Diego San Diego, CA 92093 {b1morris, trivedi}@usd.edu

More information

Video Data and Sonar Data: Real World Data Fusion Example

Video Data and Sonar Data: Real World Data Fusion Example 14th International Conferene on Information Fusion Chiago, Illinois, USA, July 5-8, 2011 Video Data and Sonar Data: Real World Data Fusion Example David W. Krout Applied Physis Lab dkrout@apl.washington.edu

More information

Generalized Loss-Sensitive Adversarial Learning with Manifold Margins

Generalized Loss-Sensitive Adversarial Learning with Manifold Margins Generalized Loss-Sensitive Adversarial Learning with Manifold Margins Marzieh Edraki and Guo-Jun Qi Laboratory for MAchine Perception and LEarning (MAPLE) http://maple.cs.ucf.edu/ University of Central

More information

arxiv: v2 [stat.ml] 21 Oct 2017

arxiv: v2 [stat.ml] 21 Oct 2017 Variational Approaches for Auto-Encoding Generative Adversarial Networks arxiv:1706.04987v2 stat.ml] 21 Oct 2017 Mihaela Rosca Balaji Lakshminarayanan David Warde-Farley Shakir Mohamed DeepMind {mihaelacr,balajiln,dwf,shakir}@google.com

More information

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors Eurographis Symposium on Geometry Proessing (003) L. Kobbelt, P. Shröder, H. Hoppe (Editors) Rotation Invariant Spherial Harmoni Representation of 3D Shape Desriptors Mihael Kazhdan, Thomas Funkhouser,

More information

A scheme for racquet sports video analysis with the combination of audio-visual information

A scheme for racquet sports video analysis with the combination of audio-visual information A sheme for raquet sports video analysis with the ombination of audio-visual information Liyuan Xing a*, Qixiang Ye b, Weigang Zhang, Qingming Huang a and Hua Yu a a Graduate Shool of the Chinese Aadamy

More information

Relevance for Computer Vision

Relevance for Computer Vision The Geometry of ROC Spae: Understanding Mahine Learning Metris through ROC Isometris, by Peter A. Flah International Conferene on Mahine Learning (ICML-23) http://www.s.bris.a.uk/publiations/papers/74.pdf

More information

ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching

ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching Chunyuan Li 1, Hao Liu 2, Changyou Chen 3, Yunchen Pu 1, Liqun Chen 1, Ricardo Henao 1 and Lawrence Carin 1 1 Duke University

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation

A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation Alexander H. Liu 1 Yen-Cheng Liu 2 Yu-Ying Yeh 3 Yu-Chiang Frank Wang 1,4 1 National Taiwan University, Taiwan 2 Georgia

More information

arxiv: v1 [cs.cv] 16 Jul 2017

arxiv: v1 [cs.cv] 16 Jul 2017 enerative adversarial network based on resnet for conditional image restoration Paper: jc*-**-**-****: enerative Adversarial Network based on Resnet for Conditional Image Restoration Meng Wang, Huafeng

More information

8 : Learning Fully Observed Undirected Graphical Models

8 : Learning Fully Observed Undirected Graphical Models 10-708: Probabilisti Graphial Models 10-708, Spring 2018 8 : Learning Fully Observed Undireted Graphial Models Leturer: Kayhan Batmanghelih Sribes: Chenghui Zhou, Cheng Ran (Harvey) Zhang When learning

More information

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq Volume 4 Issue 6 June 014 ISSN: 77 18X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om Medial Image Compression using

More information

Accommodations of QoS DiffServ Over IP and MPLS Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks Aommodations of QoS DiffServ Over IP and MPLS Networks Abdullah AlWehaibi, Anjali Agarwal, Mihael Kadoh and Ahmed ElHakeem Department of Eletrial and Computer Department de Genie Eletrique Engineering

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

19: Inference and learning in Deep Learning

19: Inference and learning in Deep Learning 10-708: Probabilistic Graphical Models 10-708, Spring 2017 19: Inference and learning in Deep Learning Lecturer: Zhiting Hu Scribes: Akash Umakantha, Ryan Williamson 1 Classes of Deep Generative Models

More information

IMPLICIT AUTOENCODERS

IMPLICIT AUTOENCODERS IMPLICIT AUTOENCODERS Anonymous authors Paper under double-blind review ABSTRACT In this paper, we describe the implicit autoencoder (IAE), a generative autoencoder in which both the generative path and

More information

Auto-encoder with Adversarially Regularized Latent Variables

Auto-encoder with Adversarially Regularized Latent Variables Information Engineering Express International Institute of Applied Informatics 2017, Vol.3, No.3, P.11 20 Auto-encoder with Adversarially Regularized Latent Variables for Semi-Supervised Learning Ryosuke

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

Semi-Amortized Variational Autoencoders

Semi-Amortized Variational Autoencoders Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag Alexander Rush Code: https://github.com/harvardnlp/sa-vae Background: Variational Autoencoders (VAE) (Kingma et al.

More information

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks Unsupervised Stereosopi Video Objet Segmentation Based on Ative Contours and Retrainable Neural Networks KLIMIS NTALIANIS, ANASTASIOS DOULAMIS, and NIKOLAOS DOULAMIS National Tehnial University of Athens

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

arxiv: v1 [cs.cv] 4 Nov 2018

arxiv: v1 [cs.cv] 4 Nov 2018 Improving GAN with neighbors embedding and gradient matching Ngoc-Trung Tran, Tuan-Anh Bui, Ngai-Man Cheung ST Electronics - SUTD Cyber Security Laboratory Singapore University of Technology and Design

More information

Deep generative models of natural images

Deep generative models of natural images Spring 2016 1 Motivation 2 3 Variational autoencoders Generative adversarial networks Generative moment matching networks Evaluating generative models 4 Outline 1 Motivation 2 3 Variational autoencoders

More information

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs?

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs? One Against One or One Against All : Whih One is Better for Handwriting Reognition with SVMs? Jonathan Milgram, Mohamed Cheriet, Robert Sabourin To ite this version: Jonathan Milgram, Mohamed Cheriet,

More information

Learning to generate with adversarial networks

Learning to generate with adversarial networks Learning to generate with adversarial networks Gilles Louppe June 27, 2016 Problem statement Assume training samples D = {x x p data, x X } ; We want a generative model p model that can draw new samples

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

Contour Box: Rejecting Object Proposals Without Explicit Closed Contours

Contour Box: Rejecting Object Proposals Without Explicit Closed Contours Contour Box: Rejeting Objet Proposals Without Expliit Closed Contours Cewu Lu, Shu Liu Jiaya Jia Chi-Keung Tang The Hong Kong University of Siene and Tehnology Stanford University The Chinese University

More information

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition A Coarse-to-Fine Classifiation Sheme for Faial Expression Reognition Xiaoyi Feng 1,, Abdenour Hadid 1 and Matti Pietikäinen 1 1 Mahine Vision Group Infoteh Oulu and Dept. of Eletrial and Information Engineering

More information

Weak Dependence on Initialization in Mixture of Linear Regressions

Weak Dependence on Initialization in Mixture of Linear Regressions Proeedings of the International MultiConferene of Engineers and Computer Sientists 8 Vol I IMECS 8, Marh -6, 8, Hong Kong Weak Dependene on Initialization in Mixture of Linear Regressions Ryohei Nakano

More information

Segmentation of brain MR image using fuzzy local Gaussian mixture model with bias field correction

Segmentation of brain MR image using fuzzy local Gaussian mixture model with bias field correction IOSR Journal of VLSI and Signal Proessing (IOSR-JVSP) Volume 2, Issue 2 (Mar. Apr. 2013), PP 35-41 e-issn: 2319 4200, p-issn No. : 2319 4197 Segmentation of brain MR image using fuzzy loal Gaussian mixture

More information

arxiv: v1 [cs.cv] 8 Jan 2019

arxiv: v1 [cs.cv] 8 Jan 2019 GILT: Generating Images from Long Text Ori Bar El, Ori Licht, Netanel Yosephian Tel-Aviv University {oribarel, oril, yosephian}@mail.tau.ac.il arxiv:1901.02404v1 [cs.cv] 8 Jan 2019 Abstract Creating an

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

Adversarial Symmetric Variational Autoencoder

Adversarial Symmetric Variational Autoencoder Adversarial Symmetric Variational Autoencoder Yunchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li and Lawrence Carin Department of Electrical and Computer Engineering, Duke University

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints Smooth Trajetory Planning Along Bezier Curve for Mobile Robots with Veloity Constraints Gil Jin Yang and Byoung Wook Choi Department of Eletrial and Information Engineering Seoul National University of

More information

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Semi-Supervised Affinity Propagation with Instance-Level Constraints Semi-Supervised Affinity Propagation with Instane-Level Constraints Inmar E. Givoni, Brendan J. Frey Probabilisti and Statistial Inferene Group University of Toronto 10 King s College Road, Toronto, Ontario,

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM

TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM M. Murugeswari 1, M.Gayathri 2 1 Assoiate Professor, 2 PG Sholar 1,2 K.L.N College of Information

More information

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction University of Wollongong Researh Online Faulty of Informatis - apers (Arhive) Faulty of Engineering and Information Sienes 7 Time delay estimation of reverberant meeting speeh: on the use of multihannel

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

EASY TRANSFER LEARNING BY EXPLOITING INTRA-DOMAIN STRUCTURES

EASY TRANSFER LEARNING BY EXPLOITING INTRA-DOMAIN STRUCTURES EASY TRANSFER LEARNING BY EXPLOITING INTRA-DOMAIN STRUCTURES Jindong Wang 1, Yiqiang Chen 1,, Han Yu 2, Meiyu Huang 3, Qiang Yang 4 1 Beiing Key Lab. of Mobile Computing and Pervasive Devie, Inst. of Computing

More information

GAN Frontiers/Related Methods

GAN Frontiers/Related Methods GAN Frontiers/Related Methods Improving GAN Training Improved Techniques for Training GANs (Salimans, et. al 2016) CSC 2541 (07/10/2016) Robin Swanson (robin@cs.toronto.edu) Training GANs is Difficult

More information

Introduction to Seismology Spring 2008

Introduction to Seismology Spring 2008 MIT OpenCourseWare http://ow.mit.edu 1.510 Introdution to Seismology Spring 008 For information about iting these materials or our Terms of Use, visit: http://ow.mit.edu/terms. 1.510 Leture Notes 3.3.007

More information

arxiv: v1 [cs.cv] 6 Sep 2018

arxiv: v1 [cs.cv] 6 Sep 2018 arxiv:1809.01890v1 [cs.cv] 6 Sep 2018 Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks Koichi Hamada, Kentaro Tachibana, Tianqi Li, Hiroto

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1, Wei-Chen Chiu 2, Sheng-De Wang 1, and Yu-Chiang Frank Wang 1 1 Graduate Institute of Electrical Engineering,

More information

Naïve Bayes Slides are adapted from Sebastian Thrun (Udacity ), Ke Chen Jonathan Huang and H. Witten s and E. Frank s Data Mining and Jeremy Wyatt,

Naïve Bayes Slides are adapted from Sebastian Thrun (Udacity ), Ke Chen Jonathan Huang and H. Witten s and E. Frank s Data Mining and Jeremy Wyatt, Naïve Bayes Slides are adapted from Sebastian Thrun (Udaity ), Ke Chen Jonathan Huang and H. Witten s and E. Frank s Data Mining and Jeremy Wyatt, Bakground There are three methods to establish a lassifier

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P (Durk) Kingma, Max Welling University of Amsterdam Ph.D. Candidate, advised by Max Durk Kingma D.P. Kingma Max Welling Problem class Directed graphical model:

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

Preliminary investigation of multi-wavelet denoising in partial discharge detection

Preliminary investigation of multi-wavelet denoising in partial discharge detection Preliminary investigation of multi-wavelet denoising in partial disharge detetion QIAN YONG Department of Eletrial Engineering Shanghai Jiaotong University 8#, Donghuan Road, Minhang distrit, Shanghai

More information

GAN and Feature Representation. Hung-yi Lee

GAN and Feature Representation. Hung-yi Lee GAN and Feature Representation Hung-yi Lee Outline Generator (Decoder) Discrimi nator + Encoder GAN+Autoencoder x InfoGAN Encoder z Generator Discrimi (Decoder) x nator scalar Discrimi z Generator x scalar

More information

Gradient based progressive probabilistic Hough transform

Gradient based progressive probabilistic Hough transform Gradient based progressive probabilisti Hough transform C.Galambos, J.Kittler and J.Matas Abstrat: The authors look at the benefits of exploiting gradient information to enhane the progressive probabilisti

More information

arxiv: v2 [cs.lg] 7 Feb 2019

arxiv: v2 [cs.lg] 7 Feb 2019 Implicit Autoencoders Alireza Makhzani Vector Institute for Artificial Intelligence University of Toronto makhzani@vectorinstitute.ai arxiv:1805.09804v2 [cs.lg 7 Feb 2019 Abstract In this paper, we describe

More information

Naïve Bayesian Rough Sets Under Fuzziness

Naïve Bayesian Rough Sets Under Fuzziness IJMSA: Vol. 6, No. 1-2, January-June 2012, pp. 19 25 Serials Publiations ISSN: 0973-6786 Naïve ayesian Rough Sets Under Fuzziness G. GANSAN 1,. KRISHNAVNI 2 T. HYMAVATHI 3 1,2,3 Department of Mathematis,

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION 2017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 25 28, 2017, TOKYO, JAPAN DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1,

More information