Naturalistic Image Synthesis Using Variational Auto-Encoder
|
|
- Arleen Bruce
- 6 years ago
- Views:
Transcription
1 Naturalistic Image Synthesis Using Variational Auto-Encoder Submitted as a project report for EECS 294 in Fall 2016 by Raaz Dwivedi and Orhan Ocal raaz.rsk@eecs.berkeley.edu ocal@eecs.berkeley.edu
2 Abstract We develop a deep generative model for naturalistic image synthesis using variational auto-encoders (VAE). Our model uses convolutional and fully-connected layers and includes an l 2 loss on features extracted from a VGGNet that was pre-trained for classification on ImageNet dataset. Feature loss is used to enahance naturality in the visual appeal of the images. These deviate from the traditional fully-connected models that use only pixel and latent loss for training VAEs. We show that use of convolutional layers in the model improves the performance for reconstruction and generation of images from the trained network. Although we obtain good results for MNIST handwritten digits dataset, we were unable generate realistic images using the diverse CIFAR-10 dataset. Furthermore, we could not conclude if incorporating feature consistency in the loss function led to better results. Hence, our results deviate from the findings presented in a recent paper by Hou et al. [7], where the authors used Celeb- Faces Attributes (CelebA) dataset, and showed that incorporating feature loss from a pre-trained VGGNet helped their VAE generate more realistic images compared to the existing models in the literature. Contents 1 Introduction Problem Statement Datasets Evaluating the Generative Model Organization Theory behind VAE 3 3 Baseline Models 5 4 Our Network and Implementation Implementation and Tools Our Results Remarks State of the Art Remarks Discussion 10
3 1 Introduction Understanding the nature around us has been an interesting problem for centuries. To teach the same to a machine, albeit a recent problem, has led to many interesting research problems. The scientific community has been trying hard to model the nature in many streams. The philosophy behind trying to understand the nature around us can be rightly attributed to Feynman s quote What I cannot create, I do not understand. Learning a generative model is one of the many approaches that try to accomplish this goal. Given a dataset, a generative model is a model for randomly generating observable values similar to that dataset. The goal is beyond reproducing the dataset, that is, the model should be able to generate images that are not a replica of a member of the training data, yet similar to the data. Such a model is helpful for many purposes. On one hand, if the model is smaller, one has successfully compressed the data. This can handle saving and transferring the data for many machine learning tasks much easily. To name a few, it can help get a larger training data for neural networks, de-noising of images, in-painting etc 1. Besides some domain specific concrete examples, generative models are also very useful in an abstract sense for the field of Artificial Intelligence. Human beings are intelligent because with time and experience, they become very good in predicting the outcome of many actions, and decide their action by taking into account the anticipated outcome. This is because, with time, they learn the generative model for various natural processes. Thus, learning the generative model is a necessity for a robot if it wants to become as intelligent as human and make decisions about which action to perform. 1.1 Problem Statement We want to learn a generative model for natural images. For this problem, various approaches have been taken in the deep learning community including Variation Auto-Encoders, General Adversarial Networks (GAN) and Pixel Recurrent Neural Networks. In this project we target synthesizing natural-looking images using Variational Auto-Encoders (VAE). This was introduced by Kingma and Welling [9] and has been widely researched since then. We start with the first work and build on till a very recent work [7]. 1.2 Datasets We use two datasets that are well known in machine learning community. First is the MNIST dataset of handwritten digits [13] with training set of 60, 000 samples, and a test set of 10, 000 samples. The handwritten digits have been size-normalized and centered in a monochromatic pixel image. Second is the CIFAR-10 dataset consisting of natural images [11]. It consists of 60, pixel RGB color images in 10 classes, with 6, 000 images per class. There are 50, 000 training images and 10, 000 test images. The classes are airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. The classes are mutually exclusive (for example, there is no overlap between automobiles and trucks)
4 1.3 Evaluating the Generative Model In the literatutre [9, 6, 17, 7, 19], we were able to find three key ways to quantify the quality of the VAE built and compare it across various models: (1) likelihood of the training and test data using the trained VAE; (2) visual appeal of the reconstructed and generated images; (3) classification accuracy for the unlabelled images, when the network is trained with partially labelled data. In this project, because our goal is to generate natural looking images and not specifically on how well the VAE s can capture the likelihood functions, we choose to evaluate our models using the visual appeal of the reconstructed and generated images. We build a model that has two parts: encoder and decoder. Encoder takes an input image and outputs useful features of the image. The decoder, on the other hand, constructs an image, given the features of the image. Informally (to be made precise in Section 2), the decoder can been as the the generative model, and the encoder can be seen as providing the codebook for the model. The two operations that a VAE targets are: (a) Reconstruction: If we input an image to encoder, and pass its output (the code of the input image) through decoder, it should output an image that matches the input image; (b) Generation: If we input a random signal (noise) to decoder, it should use it to generate a random code from the codebook and use that code to output a natural-looking image. decoder latent variable encoder Fig. 2 illustrates these two operations. To be consistent with the literature we shall refer to the code as latent variable. input image output image decoder noise latent variable (a) Reconstruction (b) Generation Fig. 1: (top) Reconstruction of an image when passed through both encoder and decoder. (bottom) When input with noise, decoder outputs a natural-looking image. 1.4 Organization The organization of this report is as follows. We briefly discuss the theoretical foundations of VAE in Section 2. We then present the baseline model introduced in the work that 2
5 introduced VAEs to the literature in Section 3. We discuss the details of the network that we implemented, and the results in Sections 4 and 5 respectively. In Section 6, we report the performance of the various state of the art models and contrast them with our results. We end with an ongoing discussion in the deep learning community, about how to approach learning a generative model, in Section 7. 2 Theory behind VAE Suppose we are given a dataset {x i, i = 1,..., N} consisting of N independent and identically distributed samples of some random variable x, whose distribution is unknown. We assume that the generation of x is a two step process: first, a value z is generated from a some prior distribution p θ (z) and then, a value x is drawn from a conditional distribution p θ (x z). The issues are: θ is unknown; and we only observe samples of x. We usually refer to x as the data, while z (the unobserved random variable) is called the hidden/latent variable (the code). We further assume that the distributions involved belong to a parametric class Θ indexed by parameter θ. We also assume that the distributions have smooth densities associated with them. A popular scheme to estimate θ is to approximate it with ˆθ MLE where ˆθ MLE denotes the maximum-likelihood estimate for the observed data: ˆθ MLE = arg max θ Θ p θ(x 1,..., x N ) When Θ is complex, maximizing the likelihood can become a hard problem. Often, adopting the viewpoint of two step generation and hidden variable comes in handy. One can write write a lower bound on the log-likelihood function as follows (using p := p θ to de-clutter notation): log p(x) = log p(x, z)dz = log p(x) p(z x) z z q(z x) q(z x)dz [ = log E z q( x) p(x) p(z x) ] q(z x) E z q( x) [log ( p(x) p(z x) q(z x) where the inequality follows from Jensen s inequality. In fact, it turns out that if we optimize over q, this becomes an equality, that is, we can express log p(x) as follows: ( log p(x) = max E z q( x) [log p(x) p(z x) )] (1) q q(z x) = max [log p(x) D(q(z x) p(z x))], (2) q where D( ) is the Kullbeck-Leibler Divergence, and the optimizer q (argmax) is given by q(z x) = p(z x). Converting the objective to a maximization (or a minimization) problem like above, is known as the variational principle. It is a technique to convert a hard problem to a simpler one; for example, Expectation-Maximization algorithm optimizes equation (2) over p( ) and q( x) iteratively, in turns, to try to maximize the likelihood function. 3 )]
6 Using similar techniques, and Bayes rule, p(x) = p(z)p(x z)/p(z x) one can derive an alternate equality for log p(x) as follows: log p(x) = E q(z x) log p(x) [ p(z)p(x z) = E q(z x) log p(z x) q(z x) ] q(z x) = E q(z x) log p(x z) + ( ) q(z x) q(z x) log q(z x) log p(z x) z z = E q(z x) log p(x z) + D(q(z x) p(z x)) D(q(z x) p(z)). Moving D(q(z x) p(z x)) to left hand side, we get ( ) q(z x) p(z) log p(x) D(q(z x) p(z x)) = E q(z x) [log p(x z)] D(q(z x) p(z)). (3) Using equations (2) and (3), we get another variational representation of the log-likelihood: log p(x) = max E q(z x) log p(x z) D(q(z x) p(z)). (4) q }{{}}{{} (A) (B) Computing the term (A) and its gradient with respect to q is tricky. Kingma and Welling [9] suggest using i.i.d. samples from q(z x) to compute the Monte Carlo estimate of (A), and a clever re-parameterization trick (for variance reduction) to enable backpropogation gradients with respect to q through the samples. Also, convenient (but rich enough) choice of Gaussian forms for p(z) and q(z x) gives a closed form for the term (B). To make our statements precise, assume the following: p(z) N (0, I), p(z x) q(z x) = N (µ z (x), Σ z (x)) where Σ z (x) = diag(σ 2 z(x)) 2 is a diagonal matrix, p(x z) N (µ x (z), I). For image synthesis, the observed data x is the image, while z (the hidden variable) can be considered as the code for the image, that captures all the meaningful information of that image. This allows us to consider p(z x) as the encoder while p(x z) can be understood as the decoder. Note that µ x (z) denotes the mean of the distribution for x for a fixed z, and is a map from the code space to a vector in the data space. The mappings µ z (x), µ x (z), σ x (z) are built using Neural-networks. Let θ = {µ z (x)} denote the parameters for the encoder layer, and φ = {µ x (z), σ x (z)} denote the parameters for the decoder layer. Let r(x i ) and z i denote the reconstructed image and code respectively, for input x i. Putting together the pieces, equation (4) reduces to the following optimization problem: min θ,φ 1 N ( 2 σz(x) 2 = ( N r(x i) µ xi (z i ) }{{} i=1 σ (1) z pixel loss or decoder loss (x) 2,..., σ (h) z (x) 2) h ( log σ z (j) (x i ) 2 + σ z (j) (x i ) 2) ) µ z (x i ) 2 j=1 }{{} latent loss or encoder loss 4
7 3 Baseline Models Kingma and Welling used 2 fully connected layers for the encoder and the decoder in their first paper [9] on VAEs. Mathematically, for the encoder h z (x) = tanh(w 1 x + b 1 ), µ z (x) = W 2 h z (x) + b 2, log σ 2 z(x) = W 3 h z (x) + b 3. Then z is generated according to N (z; µ z (x), diag(σ 2 z(x))). Afterwards, decoder is h x (z) = tanh(w 4 z + b 4 ), µ x (z) = W 5 h x (z) + b 5, log σ 2 x(z) = W 6 h x (z) + b 6. In the paper, they report results using VAEs trained using MNIST and Frey Face datasets 3. Sample images generated by their network can be seen in Fig. 2. Fig. 2: Generated digits (left) and faces (right) by the baseline model as reported in [9]. We were able to implement their model and reproduce the results on MNIST dataset. However, the reported model had poor reconstruction and generation quality for CIFAR-10 (as will be shown in Fig. 5 in Section 5). Similar poor results for CIFAR-10 have also been reported in the literature even with complex networks [4]. We discuss the possible reasons behind this in Section 6. 4 Our Network and Implementation When randomness in images is modeled at the pixel level, the implied distance between reconstructed image and the original image is measured also pixel-by-pixel. For example, when the pixel values of an image are modeled to come from a normal distribution around a mean, the distance becomes an l 2 -distance as in equation (4). It is well known that 3 Available at 5
8 models trained using such a pixel-by-pixel l 2 -loss suffers from a fundamental problem: it is incapable of capturing perceptual difference and spatial correlation between images [12]. A slight translation of pixels will create no perceptual difference to human eyes, but l 2 -loss between the original image and the image obtained after translation will be large. This is bound to affect the visual quality of the images generated by the fitted model. It is well known that early layers of pre-trained CNNs tend to capture spatial information of input images. It has also been observed that many filters resemble the classical Gabor Filters which are known to capture many shapes and spatial properties of the image (see, e.g., the paper by Yosinski et al. [20]). We believe that putting a penalty on the difference between activations of the original image and the reconstructed images when passed through such filters will help us impose better spatial properties in the reconstructed images. Thus we adopt two changes: (1) use convolutional layers in encoder and decoder and; (2) use l 2 - loss on activations/features (to be referred to as the feature loss) from the first layer of the original and reconstructed image when fed to a pre-trained 4 VGGNet [16] on ImageNet [2]. The details of the final network we use are outlined in Fig. 3. We refer to this network as the CNN model/network. 4.1 Implementation and Tools We implemented our network on TensorFlow [1] using Python. We chose to use TensorFlow because it has a simple interface to build and train networks, has pre-trained networks available for use to extract image features in our network, and the TensorBoard tool helps visualize and debug the networks in a convenient way. Our network is trained on Amazon Web Services 5 and Google Cloud Platform 6 servers. Availability of two could computing services helped us explore hyper-parameters and network configurations in parallel. 5 Our Results In this section, we present and discuss our final results obtained by training the CNN network with and without feature loss as discussed in Section 4. Results are illustrated in Figs We get good results on the easy dataset MNIST. Fig. 4 shows the images that the trained network generates randomly 7. As can be seen, the generated images look pretty natural. For CIFAR-10, however, the results are not as good. We compare four settings, using baseline and CNN model, with pixel loss and with pixel and feature loss. We make the following observations: Reconstruction: Fig. 5 shows that the reconstructed images using the CNN model (right column) are much better than the fully connected baseline model. However, it is not clear whether including feature loss is helpful or not. Row 2 shows reconstructed images that do not use feature loss, while row 3 displays results with pixel and feature loss (latent loss is always used). 4 Available at 5 Available at 6 Available at 7 We remark that reconstruction is easier than generation. 6
9 pre-trained VGGNet When encoder is fed with an image, decoder should reconstruct it. When decoder is fed with noise, it should generate a natural image. decoder SUMMARY Decoder: FC: 100 x 2048 UPSAMPLING by 2 CONV: 3 x 3 x 32 x 16 CONV: 3 x 3 x 16 x 16 UPSAMPLING by 2 CONV: 3 x 3 x 16 x 3 CONV: 3 x 3 x 3 x 3 Encoder: CONV: 3 x 3 x 3 x 16 CONV: 3 x 3 x 16 x 16 MAX POOL: 2 x 2 CONV: 3 x 3 x 16 x 32 CONV: 3 x 3 x 32 x 32 MAX POOL: 2 x 2 2 FC: 2048 x 100 (one for mean, one for variance) latent variable encoder pre-trained VGGNet Fig. 3: Our network consists of an encoder and a decoder with convolutional and fully connected layers. In addition, we use the output of the first convolutional layer of a VGGNet that is pre-trained on ImageNet dataset as feature extractor on the input image and the decoder output. The features are used in the loss function together with the latent and pixel loss. 7
10 Generation: Results with CNN model are presented in Fig. 6. When we over-fit the network on a small dataset, we get decent looking images from generation, but they look like the replicas of the input images. However, training the model on the whole dataset turns out to be pretty hard. For this case, the generated images look very blurry which agrees with the observations noted in other works in the literature. Fig. 4: When trained on MNIST, our VAE can generate realistic handwritten digits. using baseline model using our model input images pixel loss pixel + feature loss Fig. 5: Using convolutional layers in encoder/decoder we get better reconstruction on CIFAR- 10. However, using loss on features did not yield significant improvements. small dataset large dataset Fig. 6: Generated images by our VAE when trained on CIFAR-10 is not realistic. 5.1 Remarks We list some possible changes for the network that may lead to better results: 8 As described in Section 4, we used the first layer of a pre-trained VGGNet for feature loss. We tried a variety weighing schemes for the three terms in the loss, namely, the latent, pixel and feature loss, and used Adam method [8] for training the network. In our experiments, we saw that the latent loss was sensitive to chosen weights, and it was relatively easy to get unbounded latent loss. It would be interesting to try optimization methods that stabilize 8 We do not contrast with other techniques or models here, and only present tweaks suited to our model. 8
11 the learning process, such as gradient clipping [15], to see if we can train our network for weight combinations that would otherwise fail with a vanilla Adam method. Next, the choice of using only the first layer was motivated by the well known observation with CNNs, that is, their early layers tend to resemble Gabor filters which are also used extensively in classical computer vision to extract image features. We believed that modeling randomness on the feature level, and not simply on the pixel level, would make the model learn spatial structure of natural images better. As further experiments, different pre-trained networks (such as ResNet [5] or GoogLeNet [18]) and different layers of the networks can be tried. Because building the current model and experimenting with training it took considerable time, we decided to conclude the project with our current results. 6 State of the Art We discuss two state of the art VAE models: 1) Deep Recurrent Attentive Writer (DRAW) [4], and 2) VAE with Inverse Auto-Regressive Flow (IAF) [10]. DRAW uses an attention model for iterative construction of complex images. The paper reports good reconstruction of handwritten digits using MNIST dataset by tracing of lines much like a person with a pen. However, the results with CIFAR-10 are not as good. Some randomly generated images are presented in Fig. 7a and they appear to be blurry. In VAE with IAF, Kingma et al.[10] use multiple invertible parameterized transformations on the hidden variable besides using RESNet for encoder and decoder. This enables them to approximate the intractable posterior better thereby improving the lower bound on the log-likelihood. Their trained network generates shaper images (Fig. 7b), which, however, look unrealistic on a closer look. The (b) IAF (a) DRAW Fig. 7: Generated images from the state of the art models for CIFAR-10 closest model in the literature to our model was a recent work by Hou et al. [7]. Here VGGNet features are used in the loss, much like our loss function. Good results are reported for reconstruction and generation of images from Celebrity Face Dataset (CelebA) [14] which has more than 200, 000 images. They report improvement when including feature loss, in contrast to using just the pixel loss (Fig. 8). 6.1 Remarks The images generated by DRAW for CIFAR-10 are also blurry, and the VAE with IAF generates sharp but unrealistic images. The results from Fig. 8 look promising and that motivated us further to try harder. However, the authors made no comment on the performance of using feature loss for CIFAR-10 dataset. We believe that in contrast to CIFAR-10 dataset, the CelebA dataset (that they report the results for) can be said to have convenient noise as the images are pretty homogeneous (faces of human beings). CIFAR-10 has been a 9
12 Fig. 8: Feature consistent VAE [7] shows improvements when loss incorporates features extracted using a pre-trained deep network. (top) Images generated using only pixel loss appear blurry; (bottom) when feature loss is incorporated, the images look shaper. tough dataset for the VAEs, in fact, Gregor et al. remark after their results using DRAW: CIFAR-10 is very diverse, and with only 50,000 training examples it is very difficult to generate realistic-looking objects without over-fitting (in other words, without copying from the training set). We too observe a similar phenomenon. 7 Discussion Often, the objective for training a model is to make use of the model for some specific purpose. Since learning the model is usually cast as an optimization problem, and hence, the choice of the objective function and the constraint should match for what the learned model will be used. For many tasks, learning a generative model can be converted to finding the maximum-likelihood model for the data. However, the natural question is, if the goal is to generate natural-looking images, should we try and learn the maximum likelihood model? Classical statistics results show that in the limit of infinite data and well specified 9 modelclass, maximum likelihood estimate (MLE) of the model is consistent and recovers the true model. But in most of the applications, the data is finite and the model is mis-specified. Consequently, one needs to be careful if MLE is the right approach for the task at hand. Theis et al. [19] argue that if the goal is to generate natural-looking images, then MLE is not the perfect match. Let P denote the unknown distribution and let Q denote the approximate distribution that we learn using the dataset at hand. The authors argue that, maximizing likelihood is approximately same as solving min D(P Q). (5) Q On the other hand, using ideas from computational cognitive science they claim that min D(Q P ) (6) Q can be an idealized objective to train the model. For the finite data and the mis-specified case, the model learned by solving (5) tends to overgeneralize and put mass on areas where P has zero mass, leading to samples that look non-natural. On the other hand, in the same scenario, model learned from (6) tend to focus on the good modes. Put simply, MLE 9 true model belongs to the search space 10
13 tends to overgeneralize, while the solution of (6) under-generalizes. Such a claim, puts a question mark if the VAE approach, which focuses on finding MLE, is the right way to learn a generative model for natural looking images. Fig. 9 (taken from [19]) illustrates this using a simply toy example. Here P is a mixture of Gaussians while Q is a fit from amongst the isotropic Gaussian distributions with equal variance along the two axes. While the model learned after minimizing KL Divergence (KLD) 5 puts a lot of mass on non-data region, minimizing other divergences like maximum mean discrepancy (MMD) or Jensen-Shannon divergence (JSD) gives a model that fits one of the modes well, but ignores other parts of the data. DATA KLD MMD JSD Fig. 9: Illustration of the trade-off across various fits of isotropic Gaussian to a dataset drawn from a mixture of Gaussians. Lessons Learned In this project, we tried to generate natural-looking images using VAEs by training them on MNIST and CIFAR-10 datasets. We have seen that using deep CNNs yielded better performance for reconstruction and generation of complex images compared to using only shallow fully-connected networks. We tried improving the naturality of the generated images by incorporating a loss based on image features in the objective function. Although there is prior work [7] showing improvements for other datasets (CelebA), we could not observe significant improvements for the CIFAR-10 dataset. There is significant literature stating the hardness of generating natural-looking images using CIFAR-10, and our results align with those statements. On the implementation of neural networks, we have seen that debugging and training a big network takes lots of coding effort and time, even with compute clusters. For that, visualization and debugging tools such as TensorBoard is very useful. Furthermore, we have seen that good initialization of variables for training is important for convergence of the algorithm, and techniques such as Xavier Initialization helps a lot [3]. Team Contributions Both of the team members contributed similar amount of efforts. They shared efforts in both the theoretical understanding and the implementation of various models. TensorFlow 11
14 was new to the team and initially time was devoted in learning how to use it, and then experimenting with the tool. Orhan, having more interest on the implementation, had an edge in exploring various networks, finding good tools and implementing them efficiently. Raaz had useful discussions with Orhan on good coding practices and TensorBoard. Raaz, having more interest on the theory side, had an edge in efforts on learning the different directions and discussions in the literature. Orhan, had useful discussions with Raaz on various works on theory behind VAE and different works on generative models. The team members learned how to install and run TensorFlow on compute severs; Raaz learned about using Amazon Web Services, and Orhan learned about using Google Cloud Platform, and they taught each other how to use the platform they learned. Both spent equal time on trying different networks and parameters in order to improve the preliminary results. The marginal difference in efforts, if any, was compensated in time devoted to prepare the slides/poster/report/github page for the project. To conclude, the team members think that Orhan s contribution is 50% and Raaz s contribution is 50%. The interesting way the team members came up this contribution breakdown is not through providing supportive evidence as to increase their own shares, but by arguing how valuable the other member s contributions were to this project, and without their efforts, this project would not have been the same. References [1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, Software available from tensorflow.org, 1, [2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pages IEEE, [3] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Aistats, volume 9, pages , [4] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. Draw: A recurrent neural network for image generation. arxiv preprint arxiv: , [5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. arxiv preprint arxiv: , [6] Geoffrey E Hinton, Peter Dayan, Brendan J Frey, and Radford M Neal. The wake-sleep algorithm for unsupervised neural networks. Science, 268(5214):1158, [7] Xianxu Hou, Linlin Shen, Ke Sun, and Guoping Qiu. Deep feature consistent variational autoencoder. arxiv preprint arxiv: , October [8] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arxiv preprint arxiv: ,
15 [9] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR), number 2014, [10] Diederik P Kingma, Tim Salimans, and Max Welling. Improving variational inference with inverse autoregressive flow. arxiv preprint arxiv: , [11] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images [12] Jon C Leachtenauer, William Malila, John Irvine, Linda Colburn, and Nanette Salvaggio. General image-quality equation: Giqe. Applied Optics, 36(32): , [13] Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of handwritten digits, [14] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), [15] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. ICML (3), 28: , [16] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for largescale image recognition. arxiv preprint arxiv: , [17] Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. How to train deep variational autoencoders and probabilistic ladder networks. arxiv preprint arxiv: , [18] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1 9, [19] Lucas Theis, Aäron van den Oord, and Matthias Bethge. A note on the evaluation of generative models. arxiv preprint arxiv: , [20] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in neural information processing systems, pages ,
Deep generative models of natural images
Spring 2016 1 Motivation 2 3 Variational autoencoders Generative adversarial networks Generative moment matching networks Evaluating generative models 4 Outline 1 Motivation 2 3 Variational autoencoders
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationAdversarially Learned Inference
Institut des algorithmes d apprentissage de Montréal Adversarially Learned Inference Aaron Courville CIFAR Fellow Université de Montréal Joint work with: Vincent Dumoulin, Ishmael Belghazi, Olivier Mastropietro,
More informationWhen Variational Auto-encoders meet Generative Adversarial Networks
When Variational Auto-encoders meet Generative Adversarial Networks Jianbo Chen Billy Fang Cheng Ju 14 December 2016 Abstract Variational auto-encoders are a promising class of generative models. In this
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P (Durk) Kingma, Max Welling University of Amsterdam Ph.D. Candidate, advised by Max Durk Kingma D.P. Kingma Max Welling Problem class Directed graphical model:
More informationVariational Autoencoders
red red red red red red red red red red red red red red red red red red red red Tutorial 03/10/2016 Generative modelling Assume that the original dataset is drawn from a distribution P(X ). Attempt to
More informationAlternatives to Direct Supervision
CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of
More informationAmortised MAP Inference for Image Super-resolution. Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017
Amortised MAP Inference for Image Super-resolution Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi & Ferenc Huszár ICLR 2017 Super Resolution Inverse problem: Given low resolution representation
More informationTowards Conceptual Compression
Towards Conceptual Compression Karol Gregor karolg@google.com Frederic Besse fbesse@google.com Danilo Jimenez Rezende danilor@google.com Ivo Danihelka danihelka@google.com Daan Wierstra wierstra@google.com
More informationUnsupervised Learning
Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy
More informationChannel Locality Block: A Variant of Squeeze-and-Excitation
Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationPIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS
PIXELCNN++: IMPROVING THE PIXELCNN WITH DISCRETIZED LOGISTIC MIXTURE LIKELIHOOD AND OTHER MODIFICATIONS Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma {tim,karpathy,peter,dpkingma}@openai.com
More informationarxiv: v1 [cs.cv] 17 Nov 2016
Inverting The Generator Of A Generative Adversarial Network arxiv:1611.05644v1 [cs.cv] 17 Nov 2016 Antonia Creswell BICV Group Bioengineering Imperial College London ac2211@ic.ac.uk Abstract Anil Anthony
More informationQuantifying Translation-Invariance in Convolutional Neural Networks
Quantifying Translation-Invariance in Convolutional Neural Networks Eric Kauderer-Abrams Stanford University 450 Serra Mall, Stanford, CA 94305 ekabrams@stanford.edu Abstract A fundamental problem in object
More informationarxiv: v6 [stat.ml] 15 Jun 2015
VARIATIONAL RECURRENT AUTO-ENCODERS Otto Fabius & Joost R. van Amersfoort Machine Learning Group University of Amsterdam {ottofabius,joost.van.amersfoort}@gmail.com ABSTRACT arxiv:1412.6581v6 [stat.ml]
More informationPart Localization by Exploiting Deep Convolutional Networks
Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.
More informationReal-time convolutional networks for sonar image classification in low-power embedded systems
Real-time convolutional networks for sonar image classification in low-power embedded systems Matias Valdenegro-Toro Ocean Systems Laboratory - School of Engineering & Physical Sciences Heriot-Watt University,
More informationREGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION
REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological
More informationStudy of Residual Networks for Image Recognition
Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks
More informationImplicit generative models: dual vs. primal approaches
Implicit generative models: dual vs. primal approaches Ilya Tolstikhin MPI for Intelligent Systems ilya@tue.mpg.de Machine Learning Summer School 2017 Tübingen, Germany Contents 1. Unsupervised generative
More informationVariational Autoencoders. Sargur N. Srihari
Variational Autoencoders Sargur N. srihari@cedar.buffalo.edu Topics 1. Generative Model 2. Standard Autoencoder 3. Variational autoencoders (VAE) 2 Generative Model A variational autoencoder (VAE) is a
More information19: Inference and learning in Deep Learning
10-708: Probabilistic Graphical Models 10-708, Spring 2017 19: Inference and learning in Deep Learning Lecturer: Zhiting Hu Scribes: Akash Umakantha, Ryan Williamson 1 Classes of Deep Generative Models
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial Networks Luke de Oliveira Vai Technologies Lawrence Berkeley National Laboratory @lukede0 @lukedeo lukedeo@vaitech.io https://ldo.io 1 Outline Why Generative Modeling?
More informationRETRIEVAL OF FACES BASED ON SIMILARITIES Jonnadula Narasimha Rao, Keerthi Krishna Sai Viswanadh, Namani Sandeep, Allanki Upasana
ISSN 2320-9194 1 Volume 5, Issue 4, April 2017, Online: ISSN 2320-9194 RETRIEVAL OF FACES BASED ON SIMILARITIES Jonnadula Narasimha Rao, Keerthi Krishna Sai Viswanadh, Namani Sandeep, Allanki Upasana ABSTRACT
More informationVisual Recommender System with Adversarial Generator-Encoder Networks
Visual Recommender System with Adversarial Generator-Encoder Networks Bowen Yao Stanford University 450 Serra Mall, Stanford, CA 94305 boweny@stanford.edu Yilin Chen Stanford University 450 Serra Mall
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More information(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17
Improving Generative Adversarial Networks with Denoising Feature Matching David Warde-Farley 1 Yoshua Bengio 1 1 University of Montreal, ICLR,2017 Presenter: Bargav Jayaraman Outline 1 Introduction 2 Background
More informationDynamic Routing Between Capsules
Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet
More informationApplication of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset
Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Suyash Shetty Manipal Institute of Technology suyash.shashikant@learner.manipal.edu Abstract In
More informationVideo Generation Using 3D Convolutional Neural Network
Video Generation Using 3D Convolutional Neural Network Shohei Yamamoto Grad. School of Information Science and Technology The University of Tokyo yamamoto@mi.t.u-tokyo.ac.jp Tatsuya Harada Grad. School
More informationDOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION
DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1, Wei-Chen Chiu 2, Sheng-De Wang 1, and Yu-Chiang Frank Wang 1 1 Graduate Institute of Electrical Engineering,
More informationAutoencoder. Representation learning (related to dictionary learning) Both the input and the output are x
Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised
More informationarxiv: v2 [cs.lg] 17 Dec 2018
Lu Mi 1 * Macheng Shen 2 * Jingzhao Zhang 2 * 1 MIT CSAIL, 2 MIT LIDS {lumi, macshen, jzhzhang}@mit.edu The authors equally contributed to this work. This report was a part of the class project for 6.867
More informationDOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION
2017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 25 28, 2017, TOKYO, JAPAN DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1,
More informationProgressive Neural Architecture Search
Progressive Neural Architecture Search Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy 09/10/2018 @ECCV 1 Outline Introduction
More informationMulti-Glance Attention Models For Image Classification
Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We
More informationDeep Generative Models and a Probabilistic Programming Library
Deep Generative Models and a Probabilistic Programming Library Discriminative (Deep) Learning Learn a (differentiable) function mapping from input to output x f(x; θ) y Gradient back-propagation Generative
More informationScore function estimator and variance reduction techniques
and variance reduction techniques Wilker Aziz University of Amsterdam May 24, 2018 Wilker Aziz Discrete variables 1 Outline 1 2 3 Wilker Aziz Discrete variables 1 Variational inference for belief networks
More informationarxiv: v2 [cs.lg] 9 Jun 2017
Shengjia Zhao 1 Jiaming Song 1 Stefano Ermon 1 arxiv:1702.08396v2 [cs.lg] 9 Jun 2017 Abstract Deep neural networks have been shown to be very successful at learning feature hierarchies in supervised learning
More informationNeural Networks with Input Specified Thresholds
Neural Networks with Input Specified Thresholds Fei Liu Stanford University liufei@stanford.edu Junyang Qian Stanford University junyangq@stanford.edu Abstract In this project report, we propose a method
More information3D model classification using convolutional neural network
3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing
More informationGroupout: A Way to Regularize Deep Convolutional Neural Network
Groupout: A Way to Regularize Deep Convolutional Neural Network Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Groupout is a new technique
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationSmart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins
Smart Parking System using Deep Learning Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins Content Labeling tool Neural Networks Visual Road Map Labeling tool Data set Vgg16 Resnet50 Inception_v3
More informationSemi-Amortized Variational Autoencoders
Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag Alexander Rush Code: https://github.com/harvardnlp/sa-vae Background: Variational Autoencoders (VAE) (Kingma et al.
More informationGAN and Feature Representation. Hung-yi Lee
GAN and Feature Representation Hung-yi Lee Outline Generator (Decoder) Discrimi nator + Encoder GAN+Autoencoder x InfoGAN Encoder z Generator Discrimi (Decoder) x nator scalar Discrimi z Generator x scalar
More informationDeep Learning With Noise
Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu
More informationPixel-level Generative Model
Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,
More informationOne Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models
One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models [Supplemental Materials] 1. Network Architecture b ref b ref +1 We now describe the architecture of the networks
More informationDeeply Cascaded Networks
Deeply Cascaded Networks Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu 1 Introduction After the seminal work of Viola-Jones[15] fast object
More informationDEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 17
DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE 155 - MACHINE LEARNING & DATA MINING - LECTURE 17 GENERATIVE MODELS DATA 3 DATA 4 example 1 DATA 5 example 2 DATA 6 example 3 DATA 7 number of
More informationIterative Inference Models
Iterative Inference Models Joseph Marino, Yisong Yue California Institute of Technology {jmarino, yyue}@caltech.edu Stephan Mt Disney Research stephan.mt@disneyresearch.com Abstract Inference models, which
More informationSemantic Segmentation. Zhongang Qi
Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in
More informationThe Multi-Entity Variational Autoencoder
The Multi-Entity Variational Autoencoder Charlie Nash 1,2, S. M. Ali Eslami 2, Chris Burgess 2, Irina Higgins 2, Daniel Zoran 2, Theophane Weber 2, Peter Battaglia 2 1 Edinburgh University 2 DeepMind Abstract
More informationDenoising Adversarial Autoencoders
Denoising Adversarial Autoencoders Antonia Creswell BICV Imperial College London Anil Anthony Bharath BICV Imperial College London Email: ac2211@ic.ac.uk arxiv:1703.01220v4 [cs.cv] 4 Jan 2018 Abstract
More informationarxiv: v3 [cs.lg] 30 Dec 2016
Video Ladder Networks Francesco Cricri Nokia Technologies francesco.cricri@nokia.com Xingyang Ni Tampere University of Technology xingyang.ni@tut.fi arxiv:1612.01756v3 [cs.lg] 30 Dec 2016 Mikko Honkala
More informationAuxiliary Deep Generative Models
Downloaded from orbit.dtu.dk on: Dec 12, 2018 Auxiliary Deep Generative Models Maaløe, Lars; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Winther, Ole Published in: Proceedings of the 33rd International
More informationEnd-to-end Training of Differentiable Pipelines Across Machine Learning Frameworks
End-to-end Training of Differentiable Pipelines Across Machine Learning Frameworks Mitar Milutinovic Computer Science Division University of California, Berkeley mitar@cs.berkeley.edu Robert Zinkov zinkov@robots.ox.ac.uk
More informationGenerative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs
Generative Modeling with Convolutional Neural Networks Denis Dus Data Scientist at InData Labs What we will discuss 1. 2. 3. 4. Discriminative vs Generative modeling Convolutional Neural Networks How to
More informationREVISITING DISTRIBUTED SYNCHRONOUS SGD
REVISITING DISTRIBUTED SYNCHRONOUS SGD Jianmin Chen, Rajat Monga, Samy Bengio & Rafal Jozefowicz Google Brain Mountain View, CA, USA {jmchen,rajatmonga,bengio,rafalj}@google.com 1 THE NEED FOR A LARGE
More informationDeep Residual Learning
Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions Kaiming He with Xiangyu Zhang, Shaoqing Ren, Jifeng Dai, & Jian Sun Microsoft Research Asia (MSRA) MSRA @ ILSVRC & COCO 2015 Competitions 1st
More informationDeep Neural Network Hyperparameter Optimization with Genetic Algorithms
Deep Neural Network Hyperparameter Optimization with Genetic Algorithms EvoDevo A Genetic Algorithm Framework Aaron Vose, Jacob Balma, Geert Wenes, and Rangan Sukumar Cray Inc. October 2017 Presenter Vose,
More informationFacial Key Points Detection using Deep Convolutional Neural Network - NaimishNet
1 Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet Naimish Agarwal, IIIT-Allahabad (irm2013013@iiita.ac.in) Artus Krohn-Grimberghe, University of Paderborn (artus@aisbi.de)
More informationAutoencoding Beyond Pixels Using a Learned Similarity Metric
Autoencoding Beyond Pixels Using a Learned Similarity Metric International Conference on Machine Learning, 2016 Anders Boesen Lindbo Larsen, Hugo Larochelle, Søren Kaae Sønderby, Ole Winther Technical
More informationAuto-encoder with Adversarially Regularized Latent Variables
Information Engineering Express International Institute of Applied Informatics 2017, Vol.3, No.3, P.11 20 Auto-encoder with Adversarially Regularized Latent Variables for Semi-Supervised Learning Ryosuke
More informationTOWARDS A NEURAL STATISTICIAN
TOWARDS A NEURAL STATISTICIAN Harrison Edwards School of Informatics University of Edinburgh Edinburgh, UK H.L.Edwards@sms.ed.ac.uk Amos Storkey School of Informatics University of Edinburgh Edinburgh,
More informationInception and Residual Networks. Hantao Zhang. Deep Learning with Python.
Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)
More informationLab meeting (Paper review session) Stacked Generative Adversarial Networks
Lab meeting (Paper review session) Stacked Generative Adversarial Networks 2017. 02. 01. Saehoon Kim (Ph. D. candidate) Machine Learning Group Papers to be covered Stacked Generative Adversarial Networks
More informationEFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang
EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang Image Formation and Processing (IFP) Group, University of Illinois at Urbana-Champaign
More informationCOMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationLearning Convolutional Neural Networks using Hybrid Orthogonal Projection and Estimation
Proceedings of Machine Learning Research 77:1 16, 2017 ACML 2017 Learning Convolutional Neural Networks using Hybrid Orthogonal Projection and Estimation Hengyue Pan PANHY@CSE.YORKU.CA Hui Jiang HJ@CSE.YORKU.CA
More informationCRESCENDONET: A NEW DEEP CONVOLUTIONAL NEURAL NETWORK WITH ENSEMBLE BEHAVIOR
CRESCENDONET: A NEW DEEP CONVOLUTIONAL NEURAL NETWORK WITH ENSEMBLE BEHAVIOR Anonymous authors Paper under double-blind review ABSTRACT We introduce a new deep convolutional neural network, CrescendoNet,
More informationLEARNING TO INFER ABSTRACT 1 INTRODUCTION. Under review as a conference paper at ICLR Anonymous authors Paper under double-blind review
LEARNING TO INFER Anonymous authors Paper under double-blind review ABSTRACT Inference models, which replace an optimization-based inference procedure with a learned model, have been fundamental in advancing
More informationTiny ImageNet Challenge Submission
Tiny ImageNet Challenge Submission Lucas Hansen Stanford University lucash@stanford.edu Abstract Implemented a deep convolutional neural network on the GPU using Caffe and Amazon Web Services (AWS). Current
More informationDeep Learning for Visual Manipulation and Synthesis
Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay
More informationarxiv: v1 [cs.cv] 16 Jul 2017
enerative adversarial network based on resnet for conditional image restoration Paper: jc*-**-**-****: enerative Adversarial Network based on Resnet for Conditional Image Restoration Meng Wang, Huafeng
More informationarxiv: v1 [cs.lg] 24 Jan 2019
Jaehoon Cha Kyeong Soo Kim Sanghuyk Lee arxiv:9.879v [cs.lg] Jan 9 Abstract Noting the importance of the latent variables in inference and learning, we propose a novel framework for autoencoders based
More informationIn Defense of Fully Connected Layers in Visual Representation Transfer
In Defense of Fully Connected Layers in Visual Representation Transfer Chen-Lin Zhang, Jian-Hao Luo, Xiu-Shen Wei, Jianxin Wu National Key Laboratory for Novel Software Technology, Nanjing University,
More informationAutoencoders, denoising autoencoders, and learning deep networks
4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,
More informationLecture 19: Generative Adversarial Networks
Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,
More informationarxiv: v1 [eess.sp] 23 Oct 2018
Reproducing AmbientGAN: Generative models from lossy measurements arxiv:1810.10108v1 [eess.sp] 23 Oct 2018 Mehdi Ahmadi Polytechnique Montreal mehdi.ahmadi@polymtl.ca Mostafa Abdelnaim University de Montreal
More informationA Fast Learning Algorithm for Deep Belief Nets
A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National
More informationProceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong
, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA
More informationInference and Representation
Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationReal-time Object Detection CS 229 Course Project
Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection
More informationBackground-Foreground Frame Classification
Background-Foreground Frame Classification CS771A: Machine Learning Techniques Project Report Advisor: Prof. Harish Karnick Akhilesh Maurya Deepak Kumar Jay Pandya Rahul Mehra (12066) (12228) (12319) (12537)
More informationarxiv: v2 [cs.cv] 26 Jan 2018
DIRACNETS: TRAINING VERY DEEP NEURAL NET- WORKS WITHOUT SKIP-CONNECTIONS Sergey Zagoruyko, Nikos Komodakis Université Paris-Est, École des Ponts ParisTech Paris, France {sergey.zagoruyko,nikos.komodakis}@enpc.fr
More informationResearch on Pruning Convolutional Neural Network, Autoencoder and Capsule Network
Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Tianyu Wang Australia National University, Colledge of Engineering and Computer Science u@anu.edu.au Abstract. Some tasks,
More informationGradient of the lower bound
Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level
More informationAuxiliary Guided Autoregressive Variational Autoencoders
Auxiliary Guided Autoregressive Variational Autoencoders Thomas Lucas, Jakob Verbeek To cite this version: Thomas Lucas, Jakob Verbeek. Auxiliary Guided Autoregressive Variational Autoencoders. 2017.
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationAn Empirical Study of Generative Adversarial Networks for Computer Vision Tasks
An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks Report for Undergraduate Project - CS396A Vinayak Tantia (Roll No: 14805) Guide: Prof Gaurav Sharma CSE, IIT Kanpur, India
More informationAdversarial Symmetric Variational Autoencoder
Adversarial Symmetric Variational Autoencoder Yunchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li and Lawrence Carin Department of Electrical and Computer Engineering, Duke University
More informationCapsule Networks. Eric Mintun
Capsule Networks Eric Mintun Motivation An improvement* to regular Convolutional Neural Networks. Two goals: Replace max-pooling operation with something more intuitive. Keep more info about an activated
More informationUnsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning
Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the
More informationAutoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.
Stacked Denoising Sparse Variational (Adapted from Paul Quint and Ian Goodfellow) Stacked Denoising Sparse Variational Autoencoding is training a network to replicate its input to its output Applications:
More information