Deep Image Inpainting

Size: px
Start display at page:

Download "Deep Image Inpainting"

Transcription

1 Deep Image Inpainting Charles Burlin Yoann Le Calonnec Louis Duperier Abstract We present a new take on several image inpaiting techniques on small, simple images from CIFAR10. We improved context encoders by implementing several major training tricks on GAN as well as adapting the network to WGAN. We also compare encoders and discriminators based on existing state-of-the-art models against basic CNN architectures. In parallel, we worked on density-based methods and implemented Pixel CNN, Diagonal BiLSTM and Row LSTM. We propose a variation on the first, and propose a simpler model Flattened Row LSTM. We show that we can get good results on CIFAR10 and reconcile L 2 loss and visual quality. Introduction Image Inpainting consists in rebuilding missing or damaged patches of an image. Typical applications are old photos or paintings restoration, as well as image editing: Photoshop has a powerful completion tool (which can be used a as removal tool). As Convolutional Neural Networks (CNNs) now yield better-than-human classification accuracy on ImageNet [2], inpainting results are still visibly worse than human predictions. This can be explained by the fact that there are about 50,000 different ways to fill in a small 8x8x3 section, when ImageNet only has 32,000 classes. Interestingly, it is easy for us to mentally reconstruct a missing section of the image. We compare the context with our knowledge of the world. This allows us to recognize scenes, objects and extrapolate missing parts from memory, stitching them to the context. Artificial methods rely on the same principles. There are two types of methods. Local methods only use informations of the context, such as color or texture. Then, they try to expand and merge them in a smooth way. Those need very little training or prior knowledge. A nose removed from a face would be replaced by a patch of skintextured pixels. Indeed, the model cannot infer the existence of an object - a nose - at this position when it is completely missing. They work great for watermark removal, but not for larger missing patches. More powerful methods are global, context-based and semantic. They learn to recognize patterns in images (a window, a car) and use them to fill the gaps. These methods understand the necessity of the presence of a nose at a certain position of a face and are able to construct one that fits the context from their knowledge. An interesting aspect of this type of problems is the ability to easily generate massive training datasets: any image dataset (we used CIFAR10 and ImageNet) can be preprocessed by simply altering images. One can generate hundreds of millions of training examples and train very deep networks. 1. Problem Statement Each image is divided into two sections: the missing part that we try to reconstruct, and the context. For clarity, we assume the missing section is a square of dimensions n n, but the network would work identically for arbitrary removals. Figure 1. Definition of Image Inpainting Image Inpainting is usually framed as a constrained image generation problem. The network has to be able to take a context as an input, and to output an image of the same dimensions as the missing patch. Final evaluation is based on the average element-wise L 2 distance between the original missing section Y R n n 3 and the prediction Ŷ R n n 3. In our running example on CIFAR10, n = 8. For a sample i, the loss is L i = 1 n 2 (Y p,q,r (i) Ŷ p,q,r) (i) 2 p,q,r Another interesting evaluation-only metric we propose is the approximate exact-match (AEM). We notice that a 1

2 shift of a few units in a channel pixel value has very little visual impact (see figure 2). Hence, if a predicted pixel is within ±5 from the exact image on each of the three channels, it can be considered an exact match. We will report the mean-aem, or MAEM, where 100% means that the image is visually almost indistinguishable from the ground truth. Figure 2. Original image (left), ±5 randomly added to each pixel channel (right) There are two types of inpainting: blind where the network does not know the position and the shape of the missing area and non-blind, where those informations are passed as inputs. From our literature review, it appears that blind inpainting is a very hard problem. Although more documented, there is still a lot of room for improvement in non-blind inpainting. Hence, our decision to focus on the latter. Our main goal is to leverage the latest and successful architectures and techniques in computer vision to build an efficient and robust inpainter. We hope to achieve acceptable results in terms of L 2 loss compared to state-of-theart models. Our preliminary tests and the literature suggest very small practical differences between either having rectangular masks of randomly chosen shape and position or having a square-mask in the center of the images. Hence, as a coding simplification, we will mostly work on centered square masks of constant size. On the CIFAR10 dataset, we achieve this by removing a 8x8 patch in the middle of each image. 2. Related Work Several authors explored a large variety of techniques to address similar issues. In [3], Pathak et al. uses an approach that was our initial inspiration. They modify the typical GAN architecture by inputting the image context instead of random noise to predict the missing patch. They emphasize the importance of Leaky ReLU in the discriminator and of the absence of pooling. Compressions and decompressions are made using strides different than 1. They use a combination of L 2 loss and adversarial loss (how well the generator can fool the discriminator) to train the model. However, they use in several instances fully connected layers that we found increase the risk of overfitting. Also, they use relatively simple CNN architectures for encoder and decoders: we wanted to try modern architectures that yield state-ofthe-art results on other problems: VGG, Inception, etc. In [4], published by a team of researchers from Google DeepMind, the authors propose a way to model the distribution of images as a product of conditional probabilities. The idea is to predict the pixels of an image in a specific order, from the top-left corner to the bottom-right corner for instance. The conditional probability functions are modeled either by a convolutional neural network on a neighborhood of pixels (Pixel CNN), or using a recurrent neural network to encapsulate the information from pixels located in the top rows of the image. Though this method is initially meant to generate images, it can be used to solve our problem of maximizing the likelihood of the reconstructed image given the pixels we already know. In [11], Yang et al. propose a method to tackle inpainting of large sections on large images. The issue of traditional models on this tasks is that they tend to produce blurry results, with visible edges between context and reconstitution. They adapt multi-scale techniques to generate high-frequency details on top of the reconstructed object to get high resolution inpainting. They train two networks: a content network evaluated with a holistic content loss, similar to [3] approach, and a texture network that minimizes a local texture loss. They use neural matches to ensure that the texture inside the constructed crop is similar to the texture of the local context. This dual approach yields much sharper results visually, but in our case (small images) is not necessary. The architecture/pipeline is a heavily adapted version of CIFAR10 classification pipeline from Tensorflow tutorials. The only part we kept was the queuing system and monitoring session. The autoencoders/gans/wgans we use are not available online in Tensorflow. One repository provides an adaptation of [3] in Tensorflow, but it is not efficient nor functional, we did not use it. When experimenting with state-of-the-art models (VGG, Inception, ResNet), those come from the Tensorflow model zoo, and we had to modify them to adapt them to CIFAR10 rather than ImageNet (usually simplifying the model as our images are simpler). PixelCNNs exist online but we did not use the existing code as we wanted to gain experience implementing them ourselves. We also did some changes compared to the article as described below, so using the pre-existing implementation would not be very useful. 3. Dataset Our main dataset is CIFAR10: composed of 32*32*3 images, it contains 50,000 train examples and 10,000 test examples. CIFAR10 is a subset of the 80M Tiny Images dataset. Once our algorithms are stabilized, we leverage this larger dataset to train on a greater variety of images (CIFAR10 only contains 10 classes), and the gigantic number of samples is a very efficient solution to overfitting. Our pipeline will allow us to perform this scaling very easily 2

3 Figure 3. Overview of our basic autoencoder architecture and achieve high training performance. So far, our models train in a less than a few hours on the 50,000 examples of CIFAR10. As an intermediate step, we use data augmentation on CIFAR10 to increase the robustness of our algorithms to small changes, and virtually generate a larger number of samples. We apply small, random hue, saturation, contrast changes to the images, as well as random gaussian blur. We replace the center 8*8*3 crop by zeros. Some articles suggest that using random Gaussian noise or mean context color for the mask would yield better results, but we did not notice significant improvements. 4. Autoencoders Inpainting is part of a large set of image generation problems. The goal is creating or modifying pixels: deblurring, denoising, text removal (i.e. small-scale blind inpainting). Methods to solve those problems usually rely on autoencoders. Autoencoders are made of two networks: the encoder and the decoder. First, we encode the image into a lower-dimensional representation of an image (an embedding). Then, the decoder will work from this embedding to reconstruct the original image. The two networks are trained jointly to minimize the difference between input and output. This architecture forces the encoder to try to encode as much information as possible into the embedding. Since this is a small vector (a bottleneck), the encoder has to learn high-level, smarter features to compress the input with minimum loss. Those high-level features are hence more robust to noise and changes than raw pixels. For CIFAR10, the input image is a vector of length 3072, and we found that using a bottleneck of 1024 or 512 worked well. Smaller ones would not allow for enough space to encode the input information. Larger ones would not be a strong enough incentive for the autoencoder to find concise representations: storing patches of raw pixels does the work. In the literature, the ratio between image size and bottleneck size is usually greater than ours (3 to 6). Paradoxically, this is because we are working on small images which makes it harder to spot coherent objects and elements on the image. Higher compression can be achieved on ImageNet (12 or more for instance) because it is easier to isolate highlevel features on those. In order to establish a baseline, we first built a simple CNN architecture. Input is , and output is Literature encourages us to use several small filters rather than bigger ones. Indeed, we can get the same receptive field with deeper networks. For the entire project, we only used filters of size 3. We tested size 5 and 7 several times, but they always performed worse. We experimented with a [CONV-CONV-POOL2] 3 architecture, followed by 2 fully connected layers or 2 convolutional layers. This architecture is analog to a canonical well-performing architecture on CIFAR10 and allows for smooth dimension reduction along with the increase of the number of filters (starts at 32 or 64 and doubles at each pooling). We then eliminated max pooling by entirely replacing it with convolution layers of stride 2, the number of filters follows the same progression. Instead of using fully connected, moving from the 4*4*256 convolutions to the bottleneck can be seen as a stride 4 operation with 1024 filters. In every case, ReLU proved to work better. Using Batch Normalization on each layer also yielded significant (around +15%) performance improvements with similar training times: the difference is probably lower if we wait until convergence in both cases. Qualitatively, as discussed before, blurriness is an issue: colors are often right but details, textures are lost and predicted sections are usually blobs that do not blend very well into the image. We want to reduce the visible continuity errors between the predicted section and the context by predicting a patch that slightly overlaps with the context and by adding a very strong penalty to the loss on this overlap. 3

4 is the L 2 loss defined above, and L disc is a sigmoid cross entropy on the probabilities outputted by the discriminator about the generated images. L disc = i log(p i ) = i log(d(ŷ (i) )) Figure 4. Example results from the Vanilla CNN approach 5. Generative Adversarial Networks (GANs) Our initial model worked fairly well in terms of L 2 loss. However, output images were always blurry and lacking details (see figure 4). L 2 loss encourages the network to take little risk and make safe predictions: no hard shapes, no major changes throughout the patch. This is because outputting a blurry, mean-color image based on context guarantees a relatively low error because of the impossibility to make big mistakes. Therefore, although good in terms of L 2 norm, the outputs are not realistic to a human eye DCGANs In order to force our network to take more risks, we decided to try adversarial networks. The idea is to reproduce the judgment of the human eye: if the missing part is a car window, predicting a car window even if it does not perfectly fit the car is better than predicting a blurry red patch with a black indistinct shape in the middle. An average blurry image will never be realistic. The main idea in GANs is to train in parallel a discriminator network (D) that will learn to assess how real an image looks. The goal of the discriminator is to distinguish between the real patches and the generated ones. It is trained on a dataset composed of real and generated sections with corresponding labels (1 is a real image, 0 is an artificial one). Now, we penalize our generator (G) by increasing its loss if the output image is considered artificial by our discriminator. In order to fool the discriminator, the generator will have to output real-looking, sharp and well defined images. The discriminator will improve by training on those more sophisticated examples. Hopefully this positive feedback loop continues where both networks improve by trying to fool each other. The L 2 loss will not necessarily improve because our generator might make larger mistakes than before. However, images will look more real, and that is what matters in the end: if the predicted car window does not look at all like the ground truth, L 2 loss would be high but as long as the insertion is believable in the context of the car, this is a satisfying result. The new loss is L = αl rec + (1 α)l disc, where L rec Since α changes the scale of the loss, we can not use the loss itself to optimize this crucial hyperparameter. This issue is connected to the aforementioned fact that a lower L 2 does not necessarily mean a better, sharper visual result. We choose α to optimize the visual output of our samples and to ensure convergence. The discriminator is a 6 CONV network, with every other CONV having a stride of 2, followed by a final CONV1 reducing depth to 1. Output is a scalar. One can see the discriminator as performing a classification problem. Hence, several state-of-the-art models could be adapted to work as a generator. Tensorflow offers a vast zoo of pretrained models that we leveraged to improve our discriminator: VGG, Inception, Resnet. Those models typically take larger inputs (ImageNet for instance), so we tried padding our images but results were poor. Hence, we adapted and retrained those models to work on our smaller inputs. We reduced depth by removing all the first layers with dimensions higher than 32x32 and injected our input in the first smaller layers. As our input contains much less information, we also decreased the number of filters to avoid overfitting and improve training time. The same holds for the encoder: we tested the same variety of high-performing models. Figure 5. Example results from the DCGANs approach 5.2. Learning tricks We have not been able to properly train a GAN. Visually, predicted images present colored artifacts (see 5). Upon careful analysis, shapes and color gradients appear to correspond to the context, but colors are deformed (it is particularly visible on the plane). Because we have not been able to get a good convergence on the GANs, L 2 loss is higher than vanilla autoencoders. With our current architecture, randomness plays a major role: the exact same code will sometimes converge or diverge. They are unstable, and very sensitive on the network architectures. Finding the right hyper parameters and architecture details is tedious. The other common issue is the fact that one of the discriminator or the generator would get 4

5 much better than the other, and outpace, overwhelm it. For instance, if the discriminator gets too strong, the generator is unable to fool him at all and the adversarial loss skyrockets and plateaus at high numbers that conceal the L 2 loss. On the other, the generator might become so strong that it can perfectly cheat the discriminator, which stops learning and assigns the exact same probabilities to real and fake images. In this case, this is analog to not having adversarial loss (see previous part). We tried several learning tricks to alleviate those issues: Instead of mixing real and fake samples in a batch, we feed two separate, pure, batches. This allows more clear, direct updates for each batch: either we improve our knowledge of real examples, either we improve our detection of fakes, but not a confusing mix of the two. We use soft and noisy labels (see [13]): true images have a random label between 0.9 and 1.1 and fakes between 0 and 0.2. Labels are also sometimes randomly flipped between classes. Those tricks add noise and increase the robustness of the discriminator. Instead of minimizing log(1 D) where D is the output of the discriminator, we maximize logd. On this domain, those formulations are equivalent but the second does not suffer from vanishing gradients early in the training process WGANs Wasserstein ([9]) have been proposed to ease the training of GANs. They rely on the Earth-Mover distance, which is a good metric for learning distributions. WGANs minimize the EM distance (or rather, a sound approximation of it). WGANs hence allow us to estimate the EM metric during training, and empirically this metric correlates well with the visual quality of predictions (contrarily to L 2 loss). WGANs do not rely on a realness probability from the discriminator, rather we call it a score: it is unbounded and has no meaning per se. In practice, we can just remove the final sigmoid layer in the discriminator. Also, a key assumption of WGANs is that functions are Lipschitz. We use gradient clipping to ensure this. Finally, we do not need to maintain a fine balance between G and D: we can train D to convergence at every step. In practice, we train D 10 times at each iteration Reducing continuity errors Using WGANs finally allowed us to train a decent adversarial network. However, a very visible problem remained: the border between the context and the reconstitution was very visible. Visually, it is hard to explain why this border appears so clearly. After careful analysis, it turns out that a Figure 6. Output without (left) and with (right) overlap trick systematic color mistake along a line of pixel is very visible, whereas when the error is random it blends (see figure 2). Even good predictions presented this border issue. We use a trick that consists in predicting a 16x16 center square. This square includes our 8x8 original target and 4 pixels of overlap on each side. The reconstruction L 2 loss on the overlap is 20 times more penalized, but it is easy for the network to predict it correctly since it is part of its input. This trick greatly improves the quality of the border merging, because the network will make sure there is no major discontinuity within its output. Thus, since the overlap will be very close to the real context, by preventing discontinuities between the overlap and the target we prevent discontinuities between the context and the target. We only keep the 8x8 target when displaying the results. Before using this trick, we could not feed the entire image to the discriminator because it would use the obvious border to determine if the image was real or fake. With less visible borders, we have been able to feed the entire image to the discriminator and let it assess it as a whole. Indeed, whether it comes from a real image or not, a 8x8 square usually has little meaning in itself, hence the difficulty for the discriminator to make a decision. With the entire image, the judgment is easier. 6. Density based models 6.1. Key idea Instead of using supervised techniques to fill in a localized gap in the center of the image, we decided to implement unsupervised models as well. The idea here is to estimate the probability mass function p of images as a product of conditional probabilities. Hence, given an image x with shape n n, whose pixels are {x 1, x 2,..., x n 2}, the probability mass function is given by p(x) = p(x i x 1,..., x i 1 ) n 2 i=1 where p(x i x 1,..., x i 1 ) is the probability that the i-th pixel takes the value x i given that the i 1 previous pixels take the values x 1,..., x i 1. As each pixel is defined by a triplet of integer (R, G, B) {0, 255} 3, using the notation x i = 5

6 (x i,r, x i,g, x i,b ) and x <i = (x 1,.., x i 1 ), we can write p(x i x <i ) = p(x i,r x <i )p(x i,g x <i, x i,r ) p(x i,b x <i, x i,r, x i,g ) Thus our final probability p(x) is the product of 3n 2 terms and when reconstructing an image, our goal is to fill in the hole with the pixels by maximizing that probability. Our approach is a greedy one, which means that assuming we have learned the characteristics of the function p and are given an image with m missing pixels at positions i 1 < i 2 <... < i m, we will first set the value x i1,r given x <i1, then the value of x i1,g given (x <i1, x i1,r), and eventually we set x im,b given (x <im 1, x im,r, x im,g). Hence, though this approach is easy to implement and is computationally efficient, we do not have any guaranty that our reconstructed image maximizes the likelihood p(x). One obvious thing to do is to define an order on the pixels: from the top-left corner to the bottom-right one, scanning the rows one after the other. Once the order is set, we must choose the kind of conditional probability that we want to learn. We will introduce two different methods, inspired from [4]: a pixelcnn, where the conditional probability of a pixel is defined through a convolution neural network on pixels in the neighborhood, and a flattened row LSTM, which differs a lot from the one proposed in [4], where this probability is the result of the pixels from previous rows fed to an LSTM network. Figure 7. The blue pixels are in the neighborhood of the purple one, and are used to predict the conditional distribution. We then use the same network translated one step to the right to estimate the distribution of the next pixel, etc. The advantage of this method is that the same network (and thus the same parameters) are used when computing each conditional probability. Since we always use the same number of pixels (in our case 24) to compute the distribution of the next pixel, the complexity does not depend on the size of the image. However, with a higher resolution such as images, we might have had to use a larger neighborhood. An example of probability mass function for a given pixel is given in figure 8. We notice that the color of this pixel is slightly unclear to the network, otherwise we would observe a sharper less spread out distribution Pixel CNN One way to model the conditional probability p(x i x <i ) is to use a CNN. We tried several architectures mixing convolutional and leaky ReLU layers, and ending with one fully connected - softmax layer for each of the 3 channels (R,G,B). We hence used the softmax loss function, that is for an image with n 2 pixels with estimated probabilities for the true value ˆp i,true, L softmax = log (ˆp i,true ) n 2 i=1 One could argue that the softmax loss is not relevant as we are not performing a classification task: predicting 115 instead of 116 in not as bad as predicting 115 instead of 230. Indeed, if the true value for x i,r is 117, then a low probability ˆp(x i,r = 117) is not necessarily a problem as long the weight assigned to values in the range [110, 125] is big enough. However, given the large number of training examples compared to the 256 possible values for each channel, this loss provided interesting results. Our final model uses a zone of size 5 5 in the image for the first convolutional layer, located at the top-left side of the pixel, as shown on figure 7. Figure 8. Probability mass function of the three channels (R,G,B) of a pixel obtained with the CNN-FC-Softmax network The results obtained with our Pixel CNN model are not quite as good as the one we got with GANs when considering the L 2 loss, which can be explained by two factors: We trained our Pixel CNN with a softmax loss, hence it is not surprising that the achieved L 2 loss, 6.98, is higher than the one obtained with GANs. The Pixel CNN scans the image from left to right, and the value given to a pixel is determined by the 24 neighbors at its top-left side. Thus the pixel at the bottom of the image are not used when filling the gap. This partially explains the results we get on figure 9. We can clearly see that the bottom and right parts of the image, which are dominated by clear colors (light grey, white), have not been taken into account when reconstructed the bird. Indeed, the discontinuity is much more visible at the bottom and at the right sides of the reconstructed square, and the few dark feathers at the top-left corner of the missing section have been extended to a large part of this square. 6

7 Figure 9. Original image (left) and reconstructed image (right) using the Pixel CNN model. Figure 11. Original image (left) and reconstructed image (right) using the Flattened Row LSTM model. We also acknowledge the existence of the Row LSTM and Diagonal BiLSTM from the same article. We implemented and ran those methods, which worked well. Since we do not bring any new take on those techniques, we do not include a detailed report. However, given their complexity in terms of implementation and performance, we tried to propose a much simpler architecture that would still be able to bring relatively good performances on CIFAR10 and allow interesting analysis Flattened Row LSTM We now propose a model which takes into account the information from all the previous rows of the image to compute the probability distribution of a pixel. We called this model Flattened Row LSTM, and its architecture is displayed on figure 10. Figure 10. Flattened Row LSTM Architecture The whole image is flattened from the top-left corner to the bottom-right one. The pixel channels are fed to an LSTM as 256-dimensional one-hot vectors in the predefined order R G B. The hidden dimension is 64, and the fully connected - softmax layer maps the hidden vectors to 256-dimensional output vectors. The figure 11 shows an example of reconstructed image using our Flattened Row LSTM. The result does not look as good as the one obtained with the pixel CNN model. We can give several explanations: The LSTM scans the rows in an order that gives more influence to the pixels located immediately on the left of the reconstructed pixel. Whereas we were using a dense neighborhood of 24 pixels at the top left corner of our pixel with the pixel CNN, this does not happen anymore. The strength of the LSTM model is that it captures all the information from the previous rows before making a prediction. However, the current model is probably too simple to make sense of this information. By using several LSTM networks or by increasing the hidden dimensions, we might be able to make more accurate predictions. 7. Results 7.1. Vanilla Autoencoders We used Adam Optimizer for training, and we crossvalidated the learning rate. Using learning rate decay proved to be useful on all our tests to combine fast initial learning and smaller, later-stage decrease in loss. We have not been able to cross validate the decay rate and decay factor since such a process is complex. We used dropout at 0.5 on every activations, and our trials did not show significant changes in accuracy within the reasonable range of values (40% - 80%) for the drop probability. We studied the effect of the bottleneck size on our basic architecture as it is one of the most important parameter. That said, the best results do not stem from the basic architecture. Bottleneck size Train L 2 Loss Val L 2 Loss We can see that overfitting is an issue. Reducing the bottleneck size (hence the fully-connected layer size) greatly decreases the number of parameters, since this layer accounts for most of the parameters of the model. We can see overfitting decrease from 2000 to seems to be the sweet spot, and 256 is too small and probably not expressive enough. We already used early stopping and training 7

8 variables averaging (i.e. we do not use the final variables but a fading average of them throughout training, which increase stability). We plan to further combat overfitting by: using a larger dataset and/or data augmentation, implementing explicit L 2 regularization, and experimenting with dropconnect (dropping random CNN connections at train time, similar to dropout). Model L 2 Loss L 2 CNN 7.49 WGAN 4.26 Pixel CNN 6.98 Row LSTM 7.63 Table 1. L 2 losses on the test set WGANs As mentioned above, contrarily to our vanilla autoencoder, GANs do not optimize only for L2 loss. Theoretically, L2 loss results should not be better for GANs. That said, we spent much more time working on GANs because our best L2-only architecture gave very poor results visually: the loss function optimized and the visual quality do not align. Hence, our GANs have better results as a consequence of this experimental bias. Since we have not been able to properly train a DCGAN, we only report result for our implementation of WGAN, which works. The score reported in the paper was a mistake: it corresponds to a coefficient 0 for the adversarial loss (hence it is a vanilla CNN). Best results have been obtained with a VGG-like architecture. Again, we are not claiming that this is the best architecture as we have not been able to spend as much time on other alternative Density methods Our Pixel CNN conditional probability function uses several convolutional - dropout layers. We did not use any pooling layers since it did not seem to improve the quality of our reconstructed images. Since the parameters of the CNN involved are shared across all the pixels, training the Pixel CNN comes down to performing a classification task using a 24-pixels zone (5 5 1 as explained previously) as input and an integer in the range [0, 255] as target. We used the Adam Optimizer with a cross-entropy loss and cross-validated the learning rate. The dropout rate was set to 50%. Our Flattened Row LSTM model achieved a quadratic loss of 7.63 on the test set, though it was using the softmax cross-entropy loss. The hidden dimension 64 was crossvalidated but due to computation time, we could not test as many values as we would have wanted. With several LSTM networks, we probably could have obtained a better L 2 loss Comparison The results of the best run for each type of model on the test set is presented in the table 1 below. A few held-out results from our best run are presented in figure 12.. Conclusion Figure 12. Held-out results of our best run We built a relatively efficient standalone CNN-based image inpainter. We identified the need for an adversarial loss, that we implemented using a GAN. We applied several tricks on GANs to ease training, and extended [3] to use WGANs and to leverage several existing state-of-theart architectures (VGG, Inception, ResNet). We also propose applying the overlap trick to smooth borders and facilitate discriminator training. We also studied density-based methods. We implemented PixelCNNs from [4], and proposed our own model Flattened Row LSTM. We compared qualitatively and quantitatively the result to understand the shortcomings of the models. Overall, we are very satisfied with the results of our best architectures. Future improvements would include transposing our models to larger images. Performance issues aside, this would prove interesting as large images present more objects, at a larger scale, and are overall more complicated. We would also like to work on our Flattened Row LSTM model to make it more symmetric and improving its generating power. We would like the teaching team for providing us with this great learning opportunity, and Google for providing the precious GPUs that allowed us to train some of our heavy models. References [1] Ian Goodfellow. NIPS 2016 Tutorial: Generative Adversarial Networks. [2] Christian Szegedy, Sergey Ioffe & Vincent Vanhoucke. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. [3] Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell & Alexei A. Efros. Context Encoders: 8

9 Feature Learning by Inpainting. [4] Google DeepMind. Conditional Image Generation with PixelCNN Decoders. [5] Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson & Minh N. Do. Semantic Image Inpainting with Perceptual and Contextual Losses. [6] H. Noori, S. Saryazdi & H. Nezamabadi-pour. A Convolution Based Image Inpainting (2010). [7] Xiao-Jiao Mao, Chunhua Shen & Yu-Bin Yang. Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections. [8] Alhussein Fawzi, Horst Samulowitz, Deepak Turaga & Pascal Frossard. Image Inpainting Through Neural Networks Hallucinations. [9] Martin Arjovsky, Soumith Chintala & Leon Bottou. Wasserstein GAN. [10] Diederik P. Kingma & Max Welling. Auto-Encoding Variational Bayes. [11] Chao Yang et al. High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis. [12] Tijana Rui & Aleksandra Piurica. Context-Aware Patch-Based Image Inpainting Using Markov Random Field Modeling. [13] Salimans et al. Improved Techniques for Training GANs 9

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Pixel-level Generative Model

Pixel-level Generative Model Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

Unsupervised Learning

Unsupervised Learning Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy

More information

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs Generative Modeling with Convolutional Neural Networks Denis Dus Data Scientist at InData Labs What we will discuss 1. 2. 3. 4. Discriminative vs Generative modeling Convolutional Neural Networks How to

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Single Image Super Resolution of Textures via CNNs. Andrew Palmer

Single Image Super Resolution of Textures via CNNs. Andrew Palmer Single Image Super Resolution of Textures via CNNs Andrew Palmer What is Super Resolution (SR)? Simple: Obtain one or more high-resolution images from one or more low-resolution ones Many, many applications

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 11 Sinan Kalkan TRAINING A CNN Fig: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/ Feed-forward pass Note that this is written in terms of the

More information

MoonRiver: Deep Neural Network in C++

MoonRiver: Deep Neural Network in C++ MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement

More information

arxiv: v1 [cs.cv] 17 Nov 2016

arxiv: v1 [cs.cv] 17 Nov 2016 Inverting The Generator Of A Generative Adversarial Network arxiv:1611.05644v1 [cs.cv] 17 Nov 2016 Antonia Creswell BICV Group Bioengineering Imperial College London ac2211@ic.ac.uk Abstract Anil Anthony

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

arxiv: v1 [eess.sp] 23 Oct 2018

arxiv: v1 [eess.sp] 23 Oct 2018 Reproducing AmbientGAN: Generative models from lossy measurements arxiv:1810.10108v1 [eess.sp] 23 Oct 2018 Mehdi Ahmadi Polytechnique Montreal mehdi.ahmadi@polymtl.ca Mostafa Abdelnaim University de Montreal

More information

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

Lab meeting (Paper review session) Stacked Generative Adversarial Networks Lab meeting (Paper review session) Stacked Generative Adversarial Networks 2017. 02. 01. Saehoon Kim (Ph. D. candidate) Machine Learning Group Papers to be covered Stacked Generative Adversarial Networks

More information

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks Report for Undergraduate Project - CS396A Vinayak Tantia (Roll No: 14805) Guide: Prof Gaurav Sharma CSE, IIT Kanpur, India

More information

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different

More information

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models [Supplemental Materials] 1. Network Architecture b ref b ref +1 We now describe the architecture of the networks

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Lecture 19: Generative Adversarial Networks

Lecture 19: Generative Adversarial Networks Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,

More information

Image Restoration with Deep Generative Models

Image Restoration with Deep Generative Models Image Restoration with Deep Generative Models Raymond A. Yeh *, Teck-Yian Lim *, Chen Chen, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do Department of Electrical and Computer Engineering, University

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation Introduction Supplementary material In the supplementary material, we present additional qualitative results of the proposed AdaDepth

More information

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,

More information

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin GENERATIVE MODEL Given a training dataset, x, try to estimate the distribution, Pdata(x) Explicitly or Implicitly (GAN) Explicitly

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs In-Place Activated BatchNorm for Memory- Optimized Training of DNNs Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder Mapillary Research Paper: https://arxiv.org/abs/1712.02616 Code: https://github.com/mapillary/inplace_abn

More information

11. Neural Network Regularization

11. Neural Network Regularization 11. Neural Network Regularization CS 519 Deep Learning, Winter 2016 Fuxin Li With materials from Andrej Karpathy, Zsolt Kira Preventing overfitting Approach 1: Get more data! Always best if possible! If

More information

Transfer Learning. Style Transfer in Deep Learning

Transfer Learning. Style Transfer in Deep Learning Transfer Learning & Style Transfer in Deep Learning 4-DEC-2016 Gal Barzilai, Ram Machlev Deep Learning Seminar School of Electrical Engineering Tel Aviv University Part 1: Transfer Learning in Deep Learning

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with

More information

Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) Generative Adversarial Networks (GANs) Hossein Azizpour Most of the slides are courtesy of Dr. Ian Goodfellow (Research Scientist at OpenAI) and from his presentation at NIPS 2016 tutorial Note. I am generally

More information

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN. Stacked Denoising Sparse Variational (Adapted from Paul Quint and Ian Goodfellow) Stacked Denoising Sparse Variational Autoencoding is training a network to replicate its input to its output Applications:

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

All You Want To Know About CNNs. Yukun Zhu

All You Want To Know About CNNs. Yukun Zhu All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Inception Network Overview. David White CS793

Inception Network Overview. David White CS793 Inception Network Overview David White CS793 So, Leonardo DiCaprio dreams about dreaming... https://m.media-amazon.com/images/m/mv5bmjaxmzy3njcxnf5bml5banbnxkftztcwnti5otm0mw@@._v1_sy1000_cr0,0,675,1 000_AL_.jpg

More information

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version Shallow vs. deep networks Restricted Boltzmann Machines Shallow: one hidden layer Features can be learned more-or-less independently Arbitrary function approximator (with enough hidden units) Deep: two

More information

What was Monet seeing while painting? Translating artworks to photo-realistic images M. Tomei, L. Baraldi, M. Cornia, R. Cucchiara

What was Monet seeing while painting? Translating artworks to photo-realistic images M. Tomei, L. Baraldi, M. Cornia, R. Cucchiara What was Monet seeing while painting? Translating artworks to photo-realistic images M. Tomei, L. Baraldi, M. Cornia, R. Cucchiara COMPUTER VISION IN THE ARTISTIC DOMAIN The effectiveness of Computer Vision

More information

Image Restoration Using DNN

Image Restoration Using DNN Image Restoration Using DNN Hila Levi & Eran Amar Images were taken from: http://people.tuebingen.mpg.de/burger/neural_denoising/ Agenda Domain Expertise vs. End-to-End optimization Image Denoising and

More information

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague CNNS FROM THE BASICS TO RECENT ADVANCES Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague ducha.aiki@gmail.com OUTLINE Short review of the CNN design Architecture progress

More information

Supplementary: Cross-modal Deep Variational Hand Pose Estimation

Supplementary: Cross-modal Deep Variational Hand Pose Estimation Supplementary: Cross-modal Deep Variational Hand Pose Estimation Adrian Spurr, Jie Song, Seonwook Park, Otmar Hilliges ETH Zurich {spurra,jsong,spark,otmarh}@inf.ethz.ch Encoder/Decoder Linear(512) Table

More information

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Tianyu Wang Australia National University, Colledge of Engineering and Computer Science u@anu.edu.au Abstract. Some tasks,

More information

Single Image Depth Estimation via Deep Learning

Single Image Depth Estimation via Deep Learning Single Image Depth Estimation via Deep Learning Wei Song Stanford University Stanford, CA Abstract The goal of the project is to apply direct supervised deep learning to the problem of monocular depth

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

Lab 9. Julia Janicki. Introduction

Lab 9. Julia Janicki. Introduction Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Xintao Wang Ke Yu Chao Dong Chen Change Loy Problem enlarge 4 times Low-resolution image High-resolution image Previous

More information

A Deep Learning Approach to Vehicle Speed Estimation

A Deep Learning Approach to Vehicle Speed Estimation A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,

More information

Introduction to Generative Models (and GANs)

Introduction to Generative Models (and GANs) Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures adapted from NIPS 2016 Tutorial Generative Adversarial Networks Generative Models: Learning the Distributions

More information

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Convolution Neural Network for Traditional Chinese Calligraphy Recognition Convolution Neural Network for Traditional Chinese Calligraphy Recognition Boqi Li Mechanical Engineering Stanford University boqili@stanford.edu Abstract script. Fig. 1 shows examples of the same TCC

More information

Recurrent Neural Networks and Transfer Learning for Action Recognition

Recurrent Neural Networks and Transfer Learning for Action Recognition Recurrent Neural Networks and Transfer Learning for Action Recognition Andrew Giel Stanford University agiel@stanford.edu Ryan Diaz Stanford University ryandiaz@stanford.edu Abstract We have taken on the

More information

Deep Model Compression

Deep Model Compression Deep Model Compression Xin Wang Oct.31.2016 Some of the contents are borrowed from Hinton s and Song s slides. Two papers Distilling the Knowledge in a Neural Network by Geoffrey Hinton et al What s the

More information

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components

More information

Towards Principled Methods for Training Generative Adversarial Networks. Martin Arjovsky & Léon Bottou

Towards Principled Methods for Training Generative Adversarial Networks. Martin Arjovsky & Léon Bottou Towards Principled Methods for Training Generative Adversarial Networks Martin Arjovsky & Léon Bottou Unsupervised learning - We have samples from an unknown distribution Unsupervised learning - We have

More information

Deep Learning Cook Book

Deep Learning Cook Book Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Visual Recommender System with Adversarial Generator-Encoder Networks

Visual Recommender System with Adversarial Generator-Encoder Networks Visual Recommender System with Adversarial Generator-Encoder Networks Bowen Yao Stanford University 450 Serra Mall, Stanford, CA 94305 boweny@stanford.edu Yilin Chen Stanford University 450 Serra Mall

More information

Novel Lossy Compression Algorithms with Stacked Autoencoders

Novel Lossy Compression Algorithms with Stacked Autoencoders Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is

More information

Old Image De-noising and Auto-Colorization Using Linear Regression, Multilayer Perceptron and Convolutional Nearual Network

Old Image De-noising and Auto-Colorization Using Linear Regression, Multilayer Perceptron and Convolutional Nearual Network Old Image De-noising and Auto-Colorization Using Linear Regression, Multilayer Perceptron and Convolutional Nearual Network Junkyo Suh (006035729), Koosha Nassiri Nazif (005950924), and Aravindh Kumar

More information

DCGANs for image super-resolution, denoising and debluring

DCGANs for image super-resolution, denoising and debluring DCGANs for image super-resolution, denoising and debluring Qiaojing Yan Stanford University Electrical Engineering qiaojing@stanford.edu Wei Wang Stanford University Electrical Engineering wwang23@stanford.edu

More information

POINT CLOUD DEEP LEARNING

POINT CLOUD DEEP LEARNING POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/28 / 57 Introduction AGENDA Previous Work Method Result Conclusion 2 / 57 INTRODUCTION 3 / 57 2D OBJECT CLASSIFICATION Deep Learning for 2D Object Classification

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius What is Mixed Precision Training? Reduced precision tensor math with FP32 accumulation, FP16 storage Successfully used to train a variety

More information

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and

More information

arxiv: v1 [cs.cv] 25 Aug 2018

arxiv: v1 [cs.cv] 25 Aug 2018 Painting Outside the Box: Image Outpainting with GANs Mark Sabini and Gili Rusak Stanford University {msabini, gilir}@cs.stanford.edu arxiv:1808.08483v1 [cs.cv] 25 Aug 2018 Abstract The challenging task

More information

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Kihyuk Sohn 1 Sifei Liu 2 Guangyu Zhong 3 Xiang Yu 1 Ming-Hsuan Yang 2 Manmohan Chandraker 1,4 1 NEC Labs

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Unsupervised Learning of Spatiotemporally Coherent Metrics

Unsupervised Learning of Spatiotemporally Coherent Metrics Unsupervised Learning of Spatiotemporally Coherent Metrics Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun arxiv 2015. Presented by Jackie Chu Contributions Insight between slow feature

More information

CS231N Project Final Report - Fast Mixed Style Transfer

CS231N Project Final Report - Fast Mixed Style Transfer CS231N Project Final Report - Fast Mixed Style Transfer Xueyuan Mei Stanford University Computer Science xmei9@stanford.edu Fabian Chan Stanford University Computer Science fabianc@stanford.edu Tianchang

More information

Deep Learning With Noise

Deep Learning With Noise Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu

More information

Convolutional Neural Networks

Convolutional Neural Networks NPFL114, Lecture 4 Convolutional Neural Networks Milan Straka March 25, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise

More information

Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning

Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning BARAK OSHRI and NISHITH KHANDWALA We present Emel, a new framework for training baseline supervised

More information

INTRODUCTION TO DEEP LEARNING

INTRODUCTION TO DEEP LEARNING INTRODUCTION TO DEEP LEARNING CONTENTS Introduction to deep learning Contents 1. Examples 2. Machine learning 3. Neural networks 4. Deep learning 5. Convolutional neural networks 6. Conclusion 7. Additional

More information

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao

Introduction to GAN. Generative Adversarial Networks. Junheng(Jeff) Hao Introduction to GAN Generative Adversarial Networks Junheng(Jeff) Hao Adversarial Training is the coolest thing since sliced bread. -- Yann LeCun Roadmap 1. Generative Modeling 2. GAN 101: What is GAN?

More information

arxiv: v2 [cs.cv] 14 May 2018

arxiv: v2 [cs.cv] 14 May 2018 ContextVP: Fully Context-Aware Video Prediction Wonmin Byeon 1234, Qin Wang 1, Rupesh Kumar Srivastava 3, and Petros Koumoutsakos 1 arxiv:1710.08518v2 [cs.cv] 14 May 2018 Abstract Video prediction models

More information

Fei-Fei Li & Justin Johnson & Serena Yeung

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9-1 Administrative A2 due Wed May 2 Midterm: In-class Tue May 8. Covers material through Lecture 10 (Thu May 3). Sample midterm released on piazza. Midterm review session: Fri May 4 discussion

More information

Neural Networks: promises of current research

Neural Networks: promises of current research April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab

More information

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018 INF 5860 Machine learning for image classification Lecture 11: Visualization Anne Solberg April 4, 2018 Reading material The lecture is based on papers: Deep Dream: https://research.googleblog.com/2015/06/inceptionism-goingdeeper-into-neural.html

More information

Model Generalization and the Bias-Variance Trade-Off

Model Generalization and the Bias-Variance Trade-Off Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Model Generalization and the Bias-Variance Trade-Off Neural Networks and Deep Learning, Springer, 2018 Chapter 4, Section 4.1-4.2 What

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

Markov Random Fields and Gibbs Sampling for Image Denoising

Markov Random Fields and Gibbs Sampling for Image Denoising Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov

More information

Deep Learning for Embedded Security Evaluation

Deep Learning for Embedded Security Evaluation Deep Learning for Embedded Security Evaluation Emmanuel Prouff 1 1 Laboratoire de Sécurité des Composants, ANSSI, France April 2018, CISCO April 2018, CISCO E. Prouff 1/22 Contents 1. Context and Motivation

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

Report: Privacy-Preserving Classification on Deep Neural Network

Report: Privacy-Preserving Classification on Deep Neural Network Report: Privacy-Preserving Classification on Deep Neural Network Janno Veeorg Supervised by Helger Lipmaa and Raul Vicente Zafra May 25, 2017 1 Introduction In this report we consider following task: how

More information

Deep Learning for Visual Manipulation and Synthesis

Deep Learning for Visual Manipulation and Synthesis Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay

More information

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Mask R-CNN By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Types of Computer Vision Tasks http://cs231n.stanford.edu/ Semantic vs Instance Segmentation Image

More information

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D

More information