Naturalistic Image Synthesis Using Variational Auto-Encoder

Size: px
Start display at page:

Download "Naturalistic Image Synthesis Using Variational Auto-Encoder"

Transcription

1 Naturalistic Image Synthesis Using Variational Auto-Encoder Submitted as a project report for EECS 294 in Fall 2016 by Raaz Dwivedi and Orhan Ocal raaz.rsk@eecs.berkeley.edu ocal@eecs.berkeley.edu

2 Abstract We develop a deep generative model for naturalistic image synthesis using variational auto-encoders (VAE). Our model uses convolutional and fully-connected layers and includes an l 2 loss on features extracted from a VGGNet that was pre-trained for classification on ImageNet dataset. Feature loss is used to enahance naturality in the visual appeal of the images. These deviate from the traditional fully-connected models that use only pixel and latent loss for training VAEs. We show that use of convolutional layers in the model improves the performance for reconstruction and generation of images from the trained network. Although we obtain good results for MNIST handwritten digits dataset, we were unable generate realistic images using the diverse CIFAR-10 dataset. Furthermore, we could not conclude if incorporating feature consistency in the loss function led to better results. Hence, our results deviate from the findings presented in a recent paper by Hou et al. [7], where the authors used Celeb- Faces Attributes (CelebA) dataset, and showed that incorporating feature loss from a pre-trained VGGNet helped their VAE generate more realistic images compared to the existing models in the literature. Contents 1 Introduction Problem Statement Datasets Evaluating the Generative Model Organization Theory behind VAE 3 3 Baseline Models 5 4 Our Network and Implementation Implementation and Tools Our Results Remarks State of the Art Remarks Discussion 10

3 1 Introduction Understanding the nature around us has been an interesting problem for centuries. To teach the same to a machine, albeit a recent problem, has led to many interesting research problems. The scientific community has been trying hard to model the nature in many streams. The philosophy behind trying to understand the nature around us can be rightly attributed to Feynman s quote What I cannot create, I do not understand. Learning a generative model is one of the many approaches that try to accomplish this goal. Given a dataset, a generative model is a model for randomly generating observable values similar to that dataset. The goal is beyond reproducing the dataset, that is, the model should be able to generate images that are not a replica of a member of the training data, yet similar to the data. Such a model is helpful for many purposes. On one hand, if the model is smaller, one has successfully compressed the data. This can handle saving and transferring the data for many machine learning tasks much easily. To name a few, it can help get a larger training data for neural networks, de-noising of images, in-painting etc 1. Besides some domain specific concrete examples, generative models are also very useful in an abstract sense for the field of Artificial Intelligence. Human beings are intelligent because with time and experience, they become very good in predicting the outcome of many actions, and decide their action by taking into account the anticipated outcome. This is because, with time, they learn the generative model for various natural processes. Thus, learning the generative model is a necessity for a robot if it wants to become as intelligent as human and make decisions about which action to perform. 1.1 Problem Statement We want to learn a generative model for natural images. For this problem, various approaches have been taken in the deep learning community including Variation Auto-Encoders, General Adversarial Networks (GAN) and Pixel Recurrent Neural Networks. In this project we target synthesizing natural-looking images using Variational Auto-Encoders (VAE). This was introduced by Kingma and Welling [9] and has been widely researched since then. We start with the first work and build on till a very recent work [7]. 1.2 Datasets We use two datasets that are well known in machine learning community. First is the MNIST dataset of handwritten digits [13] with training set of 60, 000 samples, and a test set of 10, 000 samples. The handwritten digits have been size-normalized and centered in a monochromatic pixel image. Second is the CIFAR-10 dataset consisting of natural images [11]. It consists of 60, pixel RGB color images in 10 classes, with 6, 000 images per class. There are 50, 000 training images and 10, 000 test images. The classes are airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. The classes are mutually exclusive (for example, there is no overlap between automobiles and trucks)

4 1.3 Evaluating the Generative Model In the literatutre [9, 6, 17, 7, 19], we were able to find three key ways to quantify the quality of the VAE built and compare it across various models: (1) likelihood of the training and test data using the trained VAE; (2) visual appeal of the reconstructed and generated images; (3) classification accuracy for the unlabelled images, when the network is trained with partially labelled data. In this project, because our goal is to generate natural looking images and not specifically on how well the VAE s can capture the likelihood functions, we choose to evaluate our models using the visual appeal of the reconstructed and generated images. We build a model that has two parts: encoder and decoder. Encoder takes an input image and outputs useful features of the image. The decoder, on the other hand, constructs an image, given the features of the image. Informally (to be made precise in Section 2), the decoder can been as the the generative model, and the encoder can be seen as providing the codebook for the model. The two operations that a VAE targets are: (a) Reconstruction: If we input an image to encoder, and pass its output (the code of the input image) through decoder, it should output an image that matches the input image; (b) Generation: If we input a random signal (noise) to decoder, it should use it to generate a random code from the codebook and use that code to output a natural-looking image. decoder latent variable encoder Fig. 2 illustrates these two operations. To be consistent with the literature we shall refer to the code as latent variable. input image output image decoder noise latent variable (a) Reconstruction (b) Generation Fig. 1: (top) Reconstruction of an image when passed through both encoder and decoder. (bottom) When input with noise, decoder outputs a natural-looking image. 1.4 Organization The organization of this report is as follows. We briefly discuss the theoretical foundations of VAE in Section 2. We then present the baseline model introduced in the work that 2

5 introduced VAEs to the literature in Section 3. We discuss the details of the network that we implemented, and the results in Sections 4 and 5 respectively. In Section 6, we report the performance of the various state of the art models and contrast them with our results. We end with an ongoing discussion in the deep learning community, about how to approach learning a generative model, in Section 7. 2 Theory behind VAE Suppose we are given a dataset {x i, i = 1,..., N} consisting of N independent and identically distributed samples of some random variable x, whose distribution is unknown. We assume that the generation of x is a two step process: first, a value z is generated from a some prior distribution p θ (z) and then, a value x is drawn from a conditional distribution p θ (x z). The issues are: θ is unknown; and we only observe samples of x. We usually refer to x as the data, while z (the unobserved random variable) is called the hidden/latent variable (the code). We further assume that the distributions involved belong to a parametric class Θ indexed by parameter θ. We also assume that the distributions have smooth densities associated with them. A popular scheme to estimate θ is to approximate it with ˆθ MLE where ˆθ MLE denotes the maximum-likelihood estimate for the observed data: ˆθ MLE = arg max θ Θ p θ(x 1,..., x N ) When Θ is complex, maximizing the likelihood can become a hard problem. Often, adopting the viewpoint of two step generation and hidden variable comes in handy. One can write write a lower bound on the log-likelihood function as follows (using p := p θ to de-clutter notation): log p(x) = log p(x, z)dz = log p(x) p(z x) z z q(z x) q(z x)dz [ = log E z q( x) p(x) p(z x) ] q(z x) E z q( x) [log ( p(x) p(z x) q(z x) where the inequality follows from Jensen s inequality. In fact, it turns out that if we optimize over q, this becomes an equality, that is, we can express log p(x) as follows: ( log p(x) = max E z q( x) [log p(x) p(z x) )] (1) q q(z x) = max [log p(x) D(q(z x) p(z x))], (2) q where D( ) is the Kullbeck-Leibler Divergence, and the optimizer q (argmax) is given by q(z x) = p(z x). Converting the objective to a maximization (or a minimization) problem like above, is known as the variational principle. It is a technique to convert a hard problem to a simpler one; for example, Expectation-Maximization algorithm optimizes equation (2) over p( ) and q( x) iteratively, in turns, to try to maximize the likelihood function. 3 )]

6 Using similar techniques, and Bayes rule, p(x) = p(z)p(x z)/p(z x) one can derive an alternate equality for log p(x) as follows: log p(x) = E q(z x) log p(x) [ p(z)p(x z) = E q(z x) log p(z x) q(z x) ] q(z x) = E q(z x) log p(x z) + ( ) q(z x) q(z x) log q(z x) log p(z x) z z = E q(z x) log p(x z) + D(q(z x) p(z x)) D(q(z x) p(z)). Moving D(q(z x) p(z x)) to left hand side, we get ( ) q(z x) p(z) log p(x) D(q(z x) p(z x)) = E q(z x) [log p(x z)] D(q(z x) p(z)). (3) Using equations (2) and (3), we get another variational representation of the log-likelihood: log p(x) = max E q(z x) log p(x z) D(q(z x) p(z)). (4) q }{{}}{{} (A) (B) Computing the term (A) and its gradient with respect to q is tricky. Kingma and Welling [9] suggest using i.i.d. samples from q(z x) to compute the Monte Carlo estimate of (A), and a clever re-parameterization trick (for variance reduction) to enable backpropogation gradients with respect to q through the samples. Also, convenient (but rich enough) choice of Gaussian forms for p(z) and q(z x) gives a closed form for the term (B). To make our statements precise, assume the following: p(z) N (0, I), p(z x) q(z x) = N (µ z (x), Σ z (x)) where Σ z (x) = diag(σ 2 z(x)) 2 is a diagonal matrix, p(x z) N (µ x (z), I). For image synthesis, the observed data x is the image, while z (the hidden variable) can be considered as the code for the image, that captures all the meaningful information of that image. This allows us to consider p(z x) as the encoder while p(x z) can be understood as the decoder. Note that µ x (z) denotes the mean of the distribution for x for a fixed z, and is a map from the code space to a vector in the data space. The mappings µ z (x), µ x (z), σ x (z) are built using Neural-networks. Let θ = {µ z (x)} denote the parameters for the encoder layer, and φ = {µ x (z), σ x (z)} denote the parameters for the decoder layer. Let r(x i ) and z i denote the reconstructed image and code respectively, for input x i. Putting together the pieces, equation (4) reduces to the following optimization problem: min θ,φ 1 N ( 2 σz(x) 2 = ( N r(x i) µ xi (z i ) }{{} i=1 σ (1) z pixel loss or decoder loss (x) 2,..., σ (h) z (x) 2) h ( log σ z (j) (x i ) 2 + σ z (j) (x i ) 2) ) µ z (x i ) 2 j=1 }{{} latent loss or encoder loss 4

7 3 Baseline Models Kingma and Welling used 2 fully connected layers for the encoder and the decoder in their first paper [9] on VAEs. Mathematically, for the encoder h z (x) = tanh(w 1 x + b 1 ), µ z (x) = W 2 h z (x) + b 2, log σ 2 z(x) = W 3 h z (x) + b 3. Then z is generated according to N (z; µ z (x), diag(σ 2 z(x))). Afterwards, decoder is h x (z) = tanh(w 4 z + b 4 ), µ x (z) = W 5 h x (z) + b 5, log σ 2 x(z) = W 6 h x (z) + b 6. In the paper, they report results using VAEs trained using MNIST and Frey Face datasets 3. Sample images generated by their network can be seen in Fig. 2. Fig. 2: Generated digits (left) and faces (right) by the baseline model as reported in [9]. We were able to implement their model and reproduce the results on MNIST dataset. However, the reported model had poor reconstruction and generation quality for CIFAR-10 (as will be shown in Fig. 5 in Section 5). Similar poor results for CIFAR-10 have also been reported in the literature even with complex networks [4]. We discuss the possible reasons behind this in Section 6. 4 Our Network and Implementation When randomness in images is modeled at the pixel level, the implied distance between reconstructed image and the original image is measured also pixel-by-pixel. For example, when the pixel values of an image are modeled to come from a normal distribution around a mean, the distance becomes an l 2 -distance as in equation (4). It is well known that 3 Available at 5

8 models trained using such a pixel-by-pixel l 2 -loss suffers from a fundamental problem: it is incapable of capturing perceptual difference and spatial correlation between images [12]. A slight translation of pixels will create no perceptual difference to human eyes, but l 2 -loss between the original image and the image obtained after translation will be large. This is bound to affect the visual quality of the images generated by the fitted model. It is well known that early layers of pre-trained CNNs tend to capture spatial information of input images. It has also been observed that many filters resemble the classical Gabor Filters which are known to capture many shapes and spatial properties of the image (see, e.g., the paper by Yosinski et al. [20]). We believe that putting a penalty on the difference between activations of the original image and the reconstructed images when passed through such filters will help us impose better spatial properties in the reconstructed images. Thus we adopt two changes: (1) use convolutional layers in encoder and decoder and; (2) use l 2 - loss on activations/features (to be referred to as the feature loss) from the first layer of the original and reconstructed image when fed to a pre-trained 4 VGGNet [16] on ImageNet [2]. The details of the final network we use are outlined in Fig. 3. We refer to this network as the CNN model/network. 4.1 Implementation and Tools We implemented our network on TensorFlow [1] using Python. We chose to use TensorFlow because it has a simple interface to build and train networks, has pre-trained networks available for use to extract image features in our network, and the TensorBoard tool helps visualize and debug the networks in a convenient way. Our network is trained on Amazon Web Services 5 and Google Cloud Platform 6 servers. Availability of two could computing services helped us explore hyper-parameters and network configurations in parallel. 5 Our Results In this section, we present and discuss our final results obtained by training the CNN network with and without feature loss as discussed in Section 4. Results are illustrated in Figs We get good results on the easy dataset MNIST. Fig. 4 shows the images that the trained network generates randomly 7. As can be seen, the generated images look pretty natural. For CIFAR-10, however, the results are not as good. We compare four settings, using baseline and CNN model, with pixel loss and with pixel and feature loss. We make the following observations: Reconstruction: Fig. 5 shows that the reconstructed images using the CNN model (right column) are much better than the fully connected baseline model. However, it is not clear whether including feature loss is helpful or not. Row 2 shows reconstructed images that do not use feature loss, while row 3 displays results with pixel and feature loss (latent loss is always used). 4 Available at 5 Available at 6 Available at 7 We remark that reconstruction is easier than generation. 6

9 pre-trained VGGNet When encoder is fed with an image, decoder should reconstruct it. When decoder is fed with noise, it should generate a natural image. decoder SUMMARY Decoder: FC: 100 x 2048 UPSAMPLING by 2 CONV: 3 x 3 x 32 x 16 CONV: 3 x 3 x 16 x 16 UPSAMPLING by 2 CONV: 3 x 3 x 16 x 3 CONV: 3 x 3 x 3 x 3 Encoder: CONV: 3 x 3 x 3 x 16 CONV: 3 x 3 x 16 x 16 MAX POOL: 2 x 2 CONV: 3 x 3 x 16 x 32 CONV: 3 x 3 x 32 x 32 MAX POOL: 2 x 2 2 FC: 2048 x 100 (one for mean, one for variance) latent variable encoder pre-trained VGGNet Fig. 3: Our network consists of an encoder and a decoder with convolutional and fully connected layers. In addition, we use the output of the first convolutional layer of a VGGNet that is pre-trained on ImageNet dataset as feature extractor on the input image and the decoder output. The features are used in the loss function together with the latent and pixel loss. 7

10 Generation: Results with CNN model are presented in Fig. 6. When we over-fit the network on a small dataset, we get decent looking images from generation, but they look like the replicas of the input images. However, training the model on the whole dataset turns out to be pretty hard. For this case, the generated images look very blurry which agrees with the observations noted in other works in the literature. Fig. 4: When trained on MNIST, our VAE can generate realistic handwritten digits. using baseline model using our model input images pixel loss pixel + feature loss Fig. 5: Using convolutional layers in encoder/decoder we get better reconstruction on CIFAR- 10. However, using loss on features did not yield significant improvements. small dataset large dataset Fig. 6: Generated images by our VAE when trained on CIFAR-10 is not realistic. 5.1 Remarks We list some possible changes for the network that may lead to better results: 8 As described in Section 4, we used the first layer of a pre-trained VGGNet for feature loss. We tried a variety weighing schemes for the three terms in the loss, namely, the latent, pixel and feature loss, and used Adam method [8] for training the network. In our experiments, we saw that the latent loss was sensitive to chosen weights, and it was relatively easy to get unbounded latent loss. It would be interesting to try optimization methods that stabilize 8 We do not contrast with other techniques or models here, and only present tweaks suited to our model. 8

11 the learning process, such as gradient clipping [15], to see if we can train our network for weight combinations that would otherwise fail with a vanilla Adam method. Next, the choice of using only the first layer was motivated by the well known observation with CNNs, that is, their early layers tend to resemble Gabor filters which are also used extensively in classical computer vision to extract image features. We believed that modeling randomness on the feature level, and not simply on the pixel level, would make the model learn spatial structure of natural images better. As further experiments, different pre-trained networks (such as ResNet [5] or GoogLeNet [18]) and different layers of the networks can be tried. Because building the current model and experimenting with training it took considerable time, we decided to conclude the project with our current results. 6 State of the Art We discuss two state of the art VAE models: 1) Deep Recurrent Attentive Writer (DRAW) [4], and 2) VAE with Inverse Auto-Regressive Flow (IAF) [10]. DRAW uses an attention model for iterative construction of complex images. The paper reports good reconstruction of handwritten digits using MNIST dataset by tracing of lines much like a person with a pen. However, the results with CIFAR-10 are not as good. Some randomly generated images are presented in Fig. 7a and they appear to be blurry. In VAE with IAF, Kingma et al.[10] use multiple invertible parameterized transformations on the hidden variable besides using RESNet for encoder and decoder. This enables them to approximate the intractable posterior better thereby improving the lower bound on the log-likelihood. Their trained network generates shaper images (Fig. 7b), which, however, look unrealistic on a closer look. The (b) IAF (a) DRAW Fig. 7: Generated images from the state of the art models for CIFAR-10 closest model in the literature to our model was a recent work by Hou et al. [7]. Here VGGNet features are used in the loss, much like our loss function. Good results are reported for reconstruction and generation of images from Celebrity Face Dataset (CelebA) [14] which has more than 200, 000 images. They report improvement when including feature loss, in contrast to using just the pixel loss (Fig. 8). 6.1 Remarks The images generated by DRAW for CIFAR-10 are also blurry, and the VAE with IAF generates sharp but unrealistic images. The results from Fig. 8 look promising and that motivated us further to try harder. However, the authors made no comment on the performance of using feature loss for CIFAR-10 dataset. We believe that in contrast to CIFAR-10 dataset, the CelebA dataset (that they report the results for) can be said to have convenient noise as the images are pretty homogeneous (faces of human beings). CIFAR-10 has been a 9

12 Fig. 8: Feature consistent VAE [7] shows improvements when loss incorporates features extracted using a pre-trained deep network. (top) Images generated using only pixel loss appear blurry; (bottom) when feature loss is incorporated, the images look shaper. tough dataset for the VAEs, in fact, Gregor et al. remark after their results using DRAW: CIFAR-10 is very diverse, and with only 50,000 training examples it is very difficult to generate realistic-looking objects without over-fitting (in other words, without copying from the training set). We too observe a similar phenomenon. 7 Discussion Often, the objective for training a model is to make use of the model for some specific purpose. Since learning the model is usually cast as an optimization problem, and hence, the choice of the objective function and the constraint should match for what the learned model will be used. For many tasks, learning a generative model can be converted to finding the maximum-likelihood model for the data. However, the natural question is, if the goal is to generate natural-looking images, should we try and learn the maximum likelihood model? Classical statistics results show that in the limit of infinite data and well specified 9 modelclass, maximum likelihood estimate (MLE) of the model is consistent and recovers the true model. But in most of the applications, the data is finite and the model is mis-specified. Consequently, one needs to be careful if MLE is the right approach for the task at hand. Theis et al. [19] argue that if the goal is to generate natural-looking images, then MLE is not the perfect match. Let P denote the unknown distribution and let Q denote the approximate distribution that we learn using the dataset at hand. The authors argue that, maximizing likelihood is approximately same as solving min D(P Q). (5) Q On the other hand, using ideas from computational cognitive science they claim that min D(Q P ) (6) Q can be an idealized objective to train the model. For the finite data and the mis-specified case, the model learned by solving (5) tends to overgeneralize and put mass on areas where P has zero mass, leading to samples that look non-natural. On the other hand, in the same scenario, model learned from (6) tend to focus on the good modes. Put simply, MLE 9 true model belongs to the search space 10

13 tends to overgeneralize, while the solution of (6) under-generalizes. Such a claim, puts a question mark if the VAE approach, which focuses on finding MLE, is the right way to learn a generative model for natural looking images. Fig. 9 (taken from [19]) illustrates this using a simply toy example. Here P is a mixture of Gaussians while Q is a fit from amongst the isotropic Gaussian distributions with equal variance along the two axes. While the model learned after minimizing KL Divergence (KLD) 5 puts a lot of mass on non-data region, minimizing other divergences like maximum mean discrepancy (MMD) or Jensen-Shannon divergence (JSD) gives a model that fits one of the modes well, but ignores other parts of the data. DATA KLD MMD JSD Fig. 9: Illustration of the trade-off across various fits of isotropic Gaussian to a dataset drawn from a mixture of Gaussians. Lessons Learned In this project, we tried to generate natural-looking images using VAEs by training them on MNIST and CIFAR-10 datasets. We have seen that using deep CNNs yielded better performance for reconstruction and generation of complex images compared to using only shallow fully-connected networks. We tried improving the naturality of the generated images by incorporating a loss based on image features in the objective function. Although there is prior work [7] showing improvements for other datasets (CelebA), we could not observe significant improvements for the CIFAR-10 dataset. There is significant literature stating the hardness of generating natural-looking images using CIFAR-10, and our results align with those statements. On the implementation of neural networks, we have seen that debugging and training a big network takes lots of coding effort and time, even with compute clusters. For that, visualization and debugging tools such as TensorBoard is very useful. Furthermore, we have seen that good initialization of variables for training is important for convergence of the algorithm, and techniques such as Xavier Initialization helps a lot [3]. Team Contributions Both of the team members contributed similar amount of efforts. They shared efforts in both the theoretical understanding and the implementation of various models. TensorFlow 11

14 was new to the team and initially time was devoted in learning how to use it, and then experimenting with the tool. Orhan, having more interest on the implementation, had an edge in exploring various networks, finding good tools and implementing them efficiently. Raaz had useful discussions with Orhan on good coding practices and TensorBoard. Raaz, having more interest on the theory side, had an edge in efforts on learning the different directions and discussions in the literature. Orhan, had useful discussions with Raaz on various works on theory behind VAE and different works on generative models. The team members learned how to install and run TensorFlow on compute severs; Raaz learned about using Amazon Web Services, and Orhan learned about using Google Cloud Platform, and they taught each other how to use the platform they learned. Both spent equal time on trying different networks and parameters in order to improve the preliminary results. The marginal difference in efforts, if any, was compensated in time devoted to prepare the slides/poster/report/github page for the project. To conclude, the team members think that Orhan s contribution is 50% and Raaz s contribution is 50%. The interesting way the team members came up this contribution breakdown is not through providing supportive evidence as to increase their own shares, but by arguing how valuable the other member s contributions were to this project, and without their efforts, this project would not have been the same. References [1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, Software available from tensorflow.org, 1, [2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pages IEEE, [3] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Aistats, volume 9, pages , [4] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. Draw: A recurrent neural network for image generation. arxiv preprint arxiv: , [5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. arxiv preprint arxiv: , [6] Geoffrey E Hinton, Peter Dayan, Brendan J Frey, and Radford M Neal. The wake-sleep algorithm for unsupervised neural networks. Science, 268(5214):1158, [7] Xianxu Hou, Linlin Shen, Ke Sun, and Guoping Qiu. Deep feature consistent variational autoencoder. arxiv preprint arxiv: , October [8] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arxiv preprint arxiv: ,

15 [9] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR), number 2014, [10] Diederik P Kingma, Tim Salimans, and Max Welling. Improving variational inference with inverse autoregressive flow. arxiv preprint arxiv: , [11] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images [12] Jon C Leachtenauer, William Malila, John Irvine, Linda Colburn, and Nanette Salvaggio. General image-quality equation: Giqe. Applied Optics, 36(32): , [13] Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of handwritten digits, [14] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), [15] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. ICML (3), 28: , [16] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for largescale image recognition. arxiv preprint arxiv: , [17] Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. How to train deep variational autoencoders and probabilistic ladder networks. arxiv preprint arxiv: , [18] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1 9, [19] Lucas Theis, Aäron van den Oord, and Matthias Bethge. A note on the evaluation of generative models. arxiv preprint arxiv: , [20] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in neural information processing systems, pages ,

Deep generative models of natural images

Deep generative models of natural images Spring 2016 1 Motivation 2 3 Variational autoencoders Generative adversarial networks Generative moment matching networks Evaluating generative models 4 Outline 1 Motivation 2 3 Variational autoencoders

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

Adversarially Learned Inference

Adversarially Learned Inference Institut des algorithmes d apprentissage de Montréal Adversarially Learned Inference Aaron Courville CIFAR Fellow Université de Montréal Joint work with: Vincent Dumoulin, Ishmael Belghazi, Olivier Mastropietro,

More information

When Variational Auto-encoders meet Generative Adversarial Networks

When Variational Auto-encoders meet Generative Adversarial Networks When Variational Auto-encoders meet Generative Adversarial Networks Jianbo Chen Billy Fang Cheng Ju 14 December 2016 Abstract Variational auto-encoders are a promising class of generative models. In this

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P (Durk) Kingma, Max Welling University of Amsterdam Ph.D. Candidate, advised by Max Durk Kingma D.P. Kingma Max Welling Problem class Directed graphical model:

More information

Variational Autoencoders

Variational Autoencoders red red red red red red red red red red red red red red red red red red red red Tutorial 03/10/2016 Generative modelling Assume that the original dataset is drawn from a distribution P(X ). Attempt to

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

Amortised MAP Inference for Image Super-resolution. Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017

Amortised MAP Inference for Image Super-resolution. Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017 Amortised MAP Inference for Image Super-resolution Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017 Super Resolution Inverse problem: Given low resolution representation

More information

Towards Conceptual Compression

Towards Conceptual Compression Towards Conceptual Compression Karol Gregor karolg@google.com Frederic Besse fbesse@google.com Danilo Jimenez Rezende danilor@google.com Ivo Danihelka danihelka@google.com Daan Wierstra wierstra@google.com

More information

Unsupervised Learning

Unsupervised Learning Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 21 : A Hybrid: Deep Learning and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation

More information

PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS

PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma {tim,karpathy,peter,dpkingma}@openai.com

More information

arxiv: v1 [cs.cv] 17 Nov 2016

arxiv: v1 [cs.cv] 17 Nov 2016 Inverting The Generator Of A Generative Adversarial Network arxiv:1611.05644v1 [cs.cv] 17 Nov 2016 Antonia Creswell BICV Group Bioengineering Imperial College London ac2211@ic.ac.uk Abstract Anil Anthony

More information

Quantifying Translation-Invariance in Convolutional Neural Networks

Quantifying Translation-Invariance in Convolutional Neural Networks Quantifying Translation-Invariance in Convolutional Neural Networks Eric Kauderer-Abrams Stanford University 450 Serra Mall, Stanford, CA 94305 ekabrams@stanford.edu Abstract A fundamental problem in object

More information

arxiv: v6 [stat.ml] 15 Jun 2015

arxiv: v6 [stat.ml] 15 Jun 2015 VARIATIONAL RECURRENT AUTO-ENCODERS Otto Fabius & Joost R. van Amersfoort Machine Learning Group University of Amsterdam {ottofabius,joost.van.amersfoort}@gmail.com ABSTRACT arxiv:1412.6581v6 [stat.ml]

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

Real-time convolutional networks for sonar image classification in low-power embedded systems

Real-time convolutional networks for sonar image classification in low-power embedded systems Real-time convolutional networks for sonar image classification in low-power embedded systems Matias Valdenegro-Toro Ocean Systems Laboratory - School of Engineering & Physical Sciences Heriot-Watt University,

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Implicit generative models: dual vs. primal approaches

Implicit generative models: dual vs. primal approaches Implicit generative models: dual vs. primal approaches Ilya Tolstikhin MPI for Intelligent Systems ilya@tue.mpg.de Machine Learning Summer School 2017 Tübingen, Germany Contents 1. Unsupervised generative

More information

Variational Autoencoders. Sargur N. Srihari

Variational Autoencoders. Sargur N. Srihari Variational Autoencoders Sargur N. srihari@cedar.buffalo.edu Topics 1. Generative Model 2. Standard Autoencoder 3. Variational autoencoders (VAE) 2 Generative Model A variational autoencoder (VAE) is a

More information

19: Inference and learning in Deep Learning

19: Inference and learning in Deep Learning 10-708: Probabilistic Graphical Models 10-708, Spring 2017 19: Inference and learning in Deep Learning Lecturer: Zhiting Hu Scribes: Akash Umakantha, Ryan Williamson 1 Classes of Deep Generative Models

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Introduction to Generative Adversarial Networks

Introduction to Generative Adversarial Networks Introduction to Generative Adversarial Networks Luke de Oliveira Vai Technologies Lawrence Berkeley National Laboratory @lukede0 @lukedeo lukedeo@vaitech.io https://ldo.io 1 Outline Why Generative Modeling?

More information

RETRIEVAL OF FACES BASED ON SIMILARITIES Jonnadula Narasimha Rao, Keerthi Krishna Sai Viswanadh, Namani Sandeep, Allanki Upasana

RETRIEVAL OF FACES BASED ON SIMILARITIES Jonnadula Narasimha Rao, Keerthi Krishna Sai Viswanadh, Namani Sandeep, Allanki Upasana ISSN 2320-9194 1 Volume 5, Issue 4, April 2017, Online: ISSN 2320-9194 RETRIEVAL OF FACES BASED ON SIMILARITIES Jonnadula Narasimha Rao, Keerthi Krishna Sai Viswanadh, Namani Sandeep, Allanki Upasana ABSTRACT

More information

Visual Recommender System with Adversarial Generator-Encoder Networks

Visual Recommender System with Adversarial Generator-Encoder Networks Visual Recommender System with Adversarial Generator-Encoder Networks Bowen Yao Stanford University 450 Serra Mall, Stanford, CA 94305 boweny@stanford.edu Yilin Chen Stanford University 450 Serra Mall

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17 Improving Generative Adversarial Networks with Denoising Feature Matching David Warde-Farley 1 Yoshua Bengio 1 1 University of Montreal, ICLR,2017 Presenter: Bargav Jayaraman Outline 1 Introduction 2 Background

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Suyash Shetty Manipal Institute of Technology suyash.shashikant@learner.manipal.edu Abstract In

More information

Video Generation Using 3D Convolutional Neural Network

Video Generation Using 3D Convolutional Neural Network Video Generation Using 3D Convolutional Neural Network Shohei Yamamoto Grad. School of Information Science and Technology The University of Tokyo yamamoto@mi.t.u-tokyo.ac.jp Tatsuya Harada Grad. School

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1, Wei-Chen Chiu 2, Sheng-De Wang 1, and Yu-Chiang Frank Wang 1 1 Graduate Institute of Electrical Engineering,

More information

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised

More information

arxiv: v2 [cs.lg] 17 Dec 2018

arxiv: v2 [cs.lg] 17 Dec 2018 Lu Mi 1 * Macheng Shen 2 * Jingzhao Zhang 2 * 1 MIT CSAIL, 2 MIT LIDS {lumi, macshen, jzhzhang}@mit.edu The authors equally contributed to this work. This report was a part of the class project for 6.867

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION 2017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 25 28, 2017, TOKYO, JAPAN DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1,

More information

Progressive Neural Architecture Search

Progressive Neural Architecture Search Progressive Neural Architecture Search Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy 09/10/2018 @ECCV 1 Outline Introduction

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Deep Generative Models and a Probabilistic Programming Library

Deep Generative Models and a Probabilistic Programming Library Deep Generative Models and a Probabilistic Programming Library Discriminative (Deep) Learning Learn a (differentiable) function mapping from input to output x f(x; θ) y Gradient back-propagation Generative

More information

Score function estimator and variance reduction techniques

Score function estimator and variance reduction techniques and variance reduction techniques Wilker Aziz University of Amsterdam May 24, 2018 Wilker Aziz Discrete variables 1 Outline 1 2 3 Wilker Aziz Discrete variables 1 Variational inference for belief networks

More information

arxiv: v2 [cs.lg] 9 Jun 2017

arxiv: v2 [cs.lg] 9 Jun 2017 Shengjia Zhao 1 Jiaming Song 1 Stefano Ermon 1 arxiv:1702.08396v2 [cs.lg] 9 Jun 2017 Abstract Deep neural networks have been shown to be very successful at learning feature hierarchies in supervised learning

More information

Neural Networks with Input Specified Thresholds

Neural Networks with Input Specified Thresholds Neural Networks with Input Specified Thresholds Fei Liu Stanford University liufei@stanford.edu Junyang Qian Stanford University junyangq@stanford.edu Abstract In this project report, we propose a method

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Groupout: A Way to Regularize Deep Convolutional Neural Network

Groupout: A Way to Regularize Deep Convolutional Neural Network Groupout: A Way to Regularize Deep Convolutional Neural Network Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Groupout is a new technique

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins Smart Parking System using Deep Learning Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins Content Labeling tool Neural Networks Visual Road Map Labeling tool Data set Vgg16 Resnet50 Inception_v3

More information

Semi-Amortized Variational Autoencoders

Semi-Amortized Variational Autoencoders Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag Alexander Rush Code: https://github.com/harvardnlp/sa-vae Background: Variational Autoencoders (VAE) (Kingma et al.

More information

GAN and Feature Representation. Hung-yi Lee

GAN and Feature Representation. Hung-yi Lee GAN and Feature Representation Hung-yi Lee Outline Generator (Decoder) Discrimi nator + Encoder GAN+Autoencoder x InfoGAN Encoder z Generator Discrimi (Decoder) x nator scalar Discrimi z Generator x scalar

More information

Deep Learning With Noise

Deep Learning With Noise Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu

More information

Pixel-level Generative Model

Pixel-level Generative Model Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,

More information

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models [Supplemental Materials] 1. Network Architecture b ref b ref +1 We now describe the architecture of the networks

More information

Deeply Cascaded Networks

Deeply Cascaded Networks Deeply Cascaded Networks Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu 1 Introduction After the seminal work of Viola-Jones[15] fast object

More information

DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17

DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17 DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE 155 - MACHINE LEARNING & DATA MINING - LECTURE 17 GENERATIVE MODELS DATA 3 DATA 4 example 1 DATA 5 example 2 DATA 6 example 3 DATA 7 number of

More information

Iterative Inference Models

Iterative Inference Models Iterative Inference Models Joseph Marino, Yisong Yue California Institute of Technology {jmarino, yyue}@caltech.edu Stephan Mt Disney Research stephan.mt@disneyresearch.com Abstract Inference models, which

More information

Semantic Segmentation. Zhongang Qi

Semantic Segmentation. Zhongang Qi Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in

More information

The Multi-Entity Variational Autoencoder

The Multi-Entity Variational Autoencoder The Multi-Entity Variational Autoencoder Charlie Nash 1,2, S. M. Ali Eslami 2, Chris Burgess 2, Irina Higgins 2, Daniel Zoran 2, Theophane Weber 2, Peter Battaglia 2 1 Edinburgh University 2 DeepMind Abstract

More information

Denoising Adversarial Autoencoders

Denoising Adversarial Autoencoders Denoising Adversarial Autoencoders Antonia Creswell BICV Imperial College London Anil Anthony Bharath BICV Imperial College London Email: ac2211@ic.ac.uk arxiv:1703.01220v4 [cs.cv] 4 Jan 2018 Abstract

More information

arxiv: v3 [cs.lg] 30 Dec 2016

arxiv: v3 [cs.lg] 30 Dec 2016 Video Ladder Networks Francesco Cricri Nokia Technologies francesco.cricri@nokia.com Xingyang Ni Tampere University of Technology xingyang.ni@tut.fi arxiv:1612.01756v3 [cs.lg] 30 Dec 2016 Mikko Honkala

More information

Auxiliary Deep Generative Models

Auxiliary Deep Generative Models Downloaded from orbit.dtu.dk on: Dec 12, 2018 Auxiliary Deep Generative Models Maaløe, Lars; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Winther, Ole Published in: Proceedings of the 33rd International

More information

End-to-end Training of Differentiable Pipelines Across Machine Learning Frameworks

End-to-end Training of Differentiable Pipelines Across Machine Learning Frameworks End-to-end Training of Differentiable Pipelines Across Machine Learning Frameworks Mitar Milutinovic Computer Science Division University of California, Berkeley mitar@cs.berkeley.edu Robert Zinkov zinkov@robots.ox.ac.uk

More information

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs Generative Modeling with Convolutional Neural Networks Denis Dus Data Scientist at InData Labs What we will discuss 1. 2. 3. 4. Discriminative vs Generative modeling Convolutional Neural Networks How to

More information

REVISITING DISTRIBUTED SYNCHRONOUS SGD

REVISITING DISTRIBUTED SYNCHRONOUS SGD REVISITING DISTRIBUTED SYNCHRONOUS SGD Jianmin Chen, Rajat Monga, Samy Bengio & Rafal Jozefowicz Google Brain Mountain View, CA, USA {jmchen,rajatmonga,bengio,rafalj}@google.com 1 THE NEED FOR A LARGE

More information

Deep Residual Learning

Deep Residual Learning Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions Kaiming He with Xiangyu Zhang, Shaoqing Ren, Jifeng Dai, & Jian Sun Microsoft Research Asia (MSRA) MSRA @ ILSVRC & COCO 2015 Competitions 1st

More information

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms Deep Neural Network Hyperparameter Optimization with Genetic Algorithms EvoDevo A Genetic Algorithm Framework Aaron Vose, Jacob Balma, Geert Wenes, and Rangan Sukumar Cray Inc. October 2017 Presenter Vose,

More information

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet 1 Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet Naimish Agarwal, IIIT-Allahabad (irm2013013@iiita.ac.in) Artus Krohn-Grimberghe, University of Paderborn (artus@aisbi.de)

More information

Autoencoding Beyond Pixels Using a Learned Similarity Metric

Autoencoding Beyond Pixels Using a Learned Similarity Metric Autoencoding Beyond Pixels Using a Learned Similarity Metric International Conference on Machine Learning, 2016 Anders Boesen Lindbo Larsen, Hugo Larochelle, Søren Kaae Sønderby, Ole Winther Technical

More information

Auto-encoder with Adversarially Regularized Latent Variables

Auto-encoder with Adversarially Regularized Latent Variables Information Engineering Express International Institute of Applied Informatics 2017, Vol.3, No.3, P.11 20 Auto-encoder with Adversarially Regularized Latent Variables for Semi-Supervised Learning Ryosuke

More information

TOWARDS A NEURAL STATISTICIAN

TOWARDS A NEURAL STATISTICIAN TOWARDS A NEURAL STATISTICIAN Harrison Edwards School of Informatics University of Edinburgh Edinburgh, UK H.L.Edwards@sms.ed.ac.uk Amos Storkey School of Informatics University of Edinburgh Edinburgh,

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

Lab meeting (Paper review session) Stacked Generative Adversarial Networks Lab meeting (Paper review session) Stacked Generative Adversarial Networks 2017. 02. 01. Saehoon Kim (Ph. D. candidate) Machine Learning Group Papers to be covered Stacked Generative Adversarial Networks

More information

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang Image Formation and Processing (IFP) Group, University of Illinois at Urbana-Champaign

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

Learning Convolutional Neural Networks using Hybrid Orthogonal Projection and Estimation

Learning Convolutional Neural Networks using Hybrid Orthogonal Projection and Estimation Proceedings of Machine Learning Research 77:1 16, 2017 ACML 2017 Learning Convolutional Neural Networks using Hybrid Orthogonal Projection and Estimation Hengyue Pan PANHY@CSE.YORKU.CA Hui Jiang HJ@CSE.YORKU.CA

More information

CRESCENDONET: A NEW DEEP CONVOLUTIONAL NEURAL NETWORK WITH ENSEMBLE BEHAVIOR

CRESCENDONET: A NEW DEEP CONVOLUTIONAL NEURAL NETWORK WITH ENSEMBLE BEHAVIOR CRESCENDONET: A NEW DEEP CONVOLUTIONAL NEURAL NETWORK WITH ENSEMBLE BEHAVIOR Anonymous authors Paper under double-blind review ABSTRACT We introduce a new deep convolutional neural network, CrescendoNet,

More information

LEARNING TO INFER ABSTRACT 1 INTRODUCTION. Under review as a conference paper at ICLR Anonymous authors Paper under double-blind review

LEARNING TO INFER ABSTRACT 1 INTRODUCTION. Under review as a conference paper at ICLR Anonymous authors Paper under double-blind review LEARNING TO INFER Anonymous authors Paper under double-blind review ABSTRACT Inference models, which replace an optimization-based inference procedure with a learned model, have been fundamental in advancing

More information

Tiny ImageNet Challenge Submission

Tiny ImageNet Challenge Submission Tiny ImageNet Challenge Submission Lucas Hansen Stanford University lucash@stanford.edu Abstract Implemented a deep convolutional neural network on the GPU using Caffe and Amazon Web Services (AWS). Current

More information

Deep Learning for Visual Manipulation and Synthesis

Deep Learning for Visual Manipulation and Synthesis Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay

More information

arxiv: v1 [cs.cv] 16 Jul 2017

arxiv: v1 [cs.cv] 16 Jul 2017 enerative adversarial network based on resnet for conditional image restoration Paper: jc*-**-**-****: enerative Adversarial Network based on Resnet for Conditional Image Restoration Meng Wang, Huafeng

More information

arxiv: v1 [cs.lg] 24 Jan 2019

arxiv: v1 [cs.lg] 24 Jan 2019 Jaehoon Cha Kyeong Soo Kim Sanghuyk Lee arxiv:9.879v [cs.lg] Jan 9 Abstract Noting the importance of the latent variables in inference and learning, we propose a novel framework for autoencoders based

More information

In Defense of Fully Connected Layers in Visual Representation Transfer

In Defense of Fully Connected Layers in Visual Representation Transfer In Defense of Fully Connected Layers in Visual Representation Transfer Chen-Lin Zhang, Jian-Hao Luo, Xiu-Shen Wei, Jianxin Wu National Key Laboratory for Novel Software Technology, Nanjing University,

More information

Autoencoders, denoising autoencoders, and learning deep networks

Autoencoders, denoising autoencoders, and learning deep networks 4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,

More information

Lecture 19: Generative Adversarial Networks

Lecture 19: Generative Adversarial Networks Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,

More information

arxiv: v1 [eess.sp] 23 Oct 2018

arxiv: v1 [eess.sp] 23 Oct 2018 Reproducing AmbientGAN: Generative models from lossy measurements arxiv:1810.10108v1 [eess.sp] 23 Oct 2018 Mehdi Ahmadi Polytechnique Montreal mehdi.ahmadi@polymtl.ca Mostafa Abdelnaim University de Montreal

More information

A Fast Learning Algorithm for Deep Belief Nets

A Fast Learning Algorithm for Deep Belief Nets A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Inference and Representation

Inference and Representation Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Background-Foreground Frame Classification

Background-Foreground Frame Classification Background-Foreground Frame Classification CS771A: Machine Learning Techniques Project Report Advisor: Prof. Harish Karnick Akhilesh Maurya Deepak Kumar Jay Pandya Rahul Mehra (12066) (12228) (12319) (12537)

More information

arxiv: v2 [cs.cv] 26 Jan 2018

arxiv: v2 [cs.cv] 26 Jan 2018 DIRACNETS: TRAINING VERY DEEP NEURAL NET- WORKS WITHOUT SKIP-CONNECTIONS Sergey Zagoruyko, Nikos Komodakis Université Paris-Est, École des Ponts ParisTech Paris, France {sergey.zagoruyko,nikos.komodakis}@enpc.fr

More information

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Tianyu Wang Australia National University, Colledge of Engineering and Computer Science u@anu.edu.au Abstract. Some tasks,

More information

Gradient of the lower bound

Gradient of the lower bound Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level

More information

Auxiliary Guided Autoregressive Variational Autoencoders

Auxiliary Guided Autoregressive Variational Autoencoders Auxiliary Guided Autoregressive Variational Autoencoders Thomas Lucas, Jakob Verbeek To cite this version: Thomas Lucas, Jakob Verbeek. Auxiliary Guided Autoregressive Variational Autoencoders. 2017.

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks Report for Undergraduate Project - CS396A Vinayak Tantia (Roll No: 14805) Guide: Prof Gaurav Sharma CSE, IIT Kanpur, India

More information

Adversarial Symmetric Variational Autoencoder

Adversarial Symmetric Variational Autoencoder Adversarial Symmetric Variational Autoencoder Yunchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li and Lawrence Carin Department of Electrical and Computer Engineering, Duke University

More information

Capsule Networks. Eric Mintun

Capsule Networks. Eric Mintun Capsule Networks Eric Mintun Motivation An improvement* to regular Convolutional Neural Networks. Two goals: Replace max-pooling operation with something more intuitive. Keep more info about an activated

More information

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the

More information

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN. Stacked Denoising Sparse Variational (Adapted from Paul Quint and Ian Goodfellow) Stacked Denoising Sparse Variational Autoencoding is training a network to replicate its input to its output Applications:

More information