CS 229 Final Report: Artistic Style Transfer for Face Portraits

Similar documents
A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge

Image Transformation via Neural Network Inversion

A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Mattthias Bethge Presented by Weidi Xie (1st Oct 2015 )

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Exploring Style Transfer: Extensions to Neural Style Transfer

Neural style transfer

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

arxiv: v1 [cs.cv] 22 Feb 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Mobile Face Recognization

CS231N Project Final Report - Fast Mixed Style Transfer

Transfer Learning. Style Transfer in Deep Learning

SON OF ZORN S LEMMA: TARGETED STYLE TRANSFER USING INSTANCE-AWARE SEMANTIC SEGMENTATION

arxiv: v2 [cs.cv] 14 Jul 2018

Decoder Network over Lightweight Reconstructed Feature for Fast Semantic Style Transfer

arxiv: v1 [cs.cv] 17 Nov 2016

arxiv: v1 [cs.cv] 14 Jun 2017

Improving Semantic Style Transfer Using Guided Gram Matrices

PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION. Shin Matsuo Wataru Shimoda Keiji Yanai

Fast Patch-based Style Transfer of Arbitrary Style

CS230: Lecture 3 Various Deep Learning Topics

Supplemental Document for Deep Photo Style Transfer

Spatial Control in Neural Style Transfer

Face recognition based on improved BP neural network

A Neural Algorithm for Artistic Style Abstract 1. Introduction

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Convolution Neural Nets meet

GLStyleNet: Higher Quality Style Transfer Combining Global and Local Pyramid Features

Convolution Neural Networks for Chinese Handwriting Recognition

Deep Learning with Tensorflow AlexNet

Machine Learning 13. week

CS 395T Lecture 12: Feature Matching and Bundle Adjustment. Qixing Huang October 10 st 2018

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Full waveform inversion by deconvolution gradient method

Final Report: Smart Trash Net: Waste Localization and Classification

Measuring Aristic Similarity of Paintings

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Multi-style Transfer: Generalizing Fast Style Transfer to Several Genres

MACHINE LEARNING CLASSIFIERS ADVANTAGES AND CHALLENGES OF SELECTED METHODS

CSE 559A: Computer Vision

What was Monet seeing while painting? Translating artworks to photo-realistic images M. Tomei, L. Baraldi, M. Cornia, R. Cucchiara

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

11. Neural Network Regularization

291 Programming Assignment #3

Newton and Quasi-Newton Methods

Facial Recognition Using Eigenfaces

! References: ! Computer eyesight gets a lot more accurate, NY Times. ! Stanford CS 231n. ! Christopher Olah s blog. ! Take ECS 174!

An Exploration of Computer Vision Techniques for Bird Species Classification

arxiv: v1 [cs.cv] 5 Mar 2016 Abstract

Fuzzy Set Theory in Computer Vision: Example 3

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

arxiv: v1 [cs.cv] 7 Feb 2018

A Pragmatic AI Approach to Creating Artistic Visual Variations by Neural Style Transfer

Face Sketch Synthesis with Style Transfer using Pyramid Column Feature

Diversified Texture Synthesis with Feed-forward Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

arxiv:submit/ [cs.cv] 13 Jan 2018

A Brief Look at Optimization

Texture Synthesis Through Convolutional Neural Networks and Spectrum Constraints

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Enhancement Parameters and Perceptual Loss Functions for Personally Stylized Post-Processing

Artistic style transfer for videos

arxiv: v2 [cs.cv] 11 Sep 2018

Dynamic Routing Between Capsules

Example-Based Image Super-Resolution Techniques

Facial Keypoint Detection

MetaStyle: Three-Way Trade-Off Among Speed, Flexibility, and Quality in Neural Style Transfer

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Know your data - many types of networks

Guess the Celebrities in TV Shows!!

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Optimization for Machine Learning

Class 6 Large-Scale Image Classification

CS281 Section 3: Practical Optimization

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Convolutional Neural Networks

Lecture 37: ConvNets (Cont d) and Training

Schedule for Rest of Semester

Face Detection and Recognition in an Image Sequence using Eigenedginess

Computer Vision: Homework 5 Optical Character Recognition using Neural Networks

arxiv: v2 [cs.cv] 12 Feb 2018

Setting Low-Level Vision Parameters

Particle Swarm Optimization applied to Pattern Recognition

Localized Style Transfer

Convolutional Neural Network for Facial Expression Recognition

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Real-time Object Detection CS 229 Course Project

Deep Learning and Its Applications

GENERATIVE ADVERSARIAL NETWORK-BASED VIR-

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

arxiv: v2 [cs.cv] 11 Apr 2017

Image Coding with Active Appearance Models

Controlling Perceptual Factors in Neural Style Transfer

Lecture: Deep Convolutional Neural Networks

Model parametrization strategies for Newton-based acoustic full waveform

Diagonal Principal Component Analysis for Face Recognition

A Deep Learning Approach to Vehicle Speed Estimation

Face Detection and Tracking Control with Omni Car

Transcription:

CS 229 Final Report: Artistic Style Transfer for Face Portraits Daniel Hsu, Marcus Pan, Chen Zhu {dwhsu, mpanj, chen0908}@stanford.edu Dec 16, 2016 1 Introduction The goal of our project is to learn the content and style representations of face portraits, and then to combine them to produce new pictures. The content features of a face are the features that identify a face, such as the outline shape. The stylistic features are the artistic characteristics of a certain portrait or painting, such as brush strokes, or background color. We forward-pass a content image, and several style images through a CNN to extract the desired content and style features. Then we initialize a white noise image, and perform gradient descent on its pixels until it matches the desired style and content features. 2 Related Work In a recent paper, Gatys et. al. [1] performed style transfer using CNN. His algorithm combined one content with one style image well, but was computationally expensive, as it required a gradient descent loop for every image generated through a pre-trained CNN. Johnson et. al [2] developed an algorithm to perform style transfer using only one forward-pass through a CNN. This approach involved training a separate style-transfer network for each desired style. Dumoulin et. al [3] then extended this approach by generalizing the style transfer network to work with multiple styles. In view of our limited time and computational resources, we decided to base our work on Gatys algorithm, but with modifications to work with faces specifically. We used ideas from the eigenface algorithm [4] to generate average style representations 3 Model and Platform We use a pre-trained CNN for face classification, VGG- Face [5]. This CNN was very deep (27 layers) which is useful for extracting content features, as described below. For style input, we pulled several images from various art sources such as Picasso, Van Gough and anime from the Internet. As for our platform, we chose MatConvNet, a MAT- LAB toolbox implementing convolutional neural networks. It is e cient, simple and supports many pretrained network models. It also provides GPU acceleration functions. We noticed that there were existing implementations of artistic style transfer algorithm based on Torch, Ca e and other deep learning toolboxes. But no one has implemented style transfer using MatConvNet. We hope our project can be a supplement to existing implementations. 4 Gradient Descent Loss Functions We start with a white noise image, x, and gradient descent on its pixels until it jointly minimizes several loss functions: L total = al content + bl style + cl regularization x := x r x L total The loss functions are described below: Content Loss As the content image progresses down into deeper and deeper layers of a CNN, it gets down-sampled into smaller and smaller images. This down-sampling essentially captures the details of a section of the image, and generates a summary of that section into a new downsampled image. So at the deepest levels of the CNN, we capture the most significant features of the image, or its content. The content loss function for content is just the sum squared di erence of activations between the input white noise image, X, and the input content image, P at a certain desired deep layer l: L = 1 2 dl dx l,ijk = X (X (l) ijk i,j,k P (l) ijk )2 ( (X (l) P (l) ) ijk,p (l) ) ijk > 0 0 otherwise We can then back propagate the derivative to the original input image. Figure 1 shows the generated content representation from di erent deep layers of the network. The representation from the deeper layer is more desirable for our purposes, as it preserves the general structure and lines, but does not enforce a strict similarity with the original image. 1

Regularization Loss In order to prevent the algorithm by overfitting the pixels by making them out of the bounds: (0, 255), we define the following per-pixel loss functions as described by Mahendran et. al [6]: (a) Layer 13 (b) Layer 27 Figure 1: Content representations from di erent layers Style Loss In order to compare style, we first transform the output of a layer l in the CNN (3D) into a 2D matrix, F l. Each row in F represents one filter output of the layer in the CNN. From F, we form a gram matrix G: L(x i,j ) size = x 2 i,j L(x i,j ) var =(x i,j x i+1,j ) 2 +(x i,j x i,j+1 ) 2 Figure 3 shows the e ects of regularization. The picture without regularization is noisy, because many of its pixels are out of range and have been clipped to (0,255). G l ij = X k F l ijf l jk G ij is the dot product of row i and row j in F. This measures the correlations between layers, but does not capture the global arrangement. We compare the Gram matrices of 2 images to calculate the error between their styles. We define G as the Gram matrix of an input white noise image, passed through the CNN. We define A as the Gram matrix of a desired style image, passed through the CNN. The loss from a desired layer l is defined as @F (l) ij = L = 1 2 X i,j (G (l) ij A (l) ij )2 From this loss L, we can calculate the gradient of the loss with respect to F (l), a layer in the CNN: ( @L ((F (l) ) T (G (l) A (l) ji if F (l) ij > 0 0 otherwise Then we use standard back propagation to calculate the loss derivative with respect to the original input style image. The results are shown in figure 2. The style representations capture the color and aesthetics of the image, but not its content. (a) Original Image (b) Style Representation Figure 2: Style Representations (a) With reg. Figure 3: Regularization 5 Additional Methods Multiple Style Transfer (b) Without reg. We developed the following algorithms to transfer multiple styles onto a content image: 1. Mixed Gradient Descent A very direct method to combine multiple styles is to gradient descent on the their collective errors. In this approach, we first note that in our gradient descent, part of the error term comes from the style loss. The style loss corresponds to how much our generated image currently di ers from the activated style image we are trying to match. So then to expand this method to multiple images, we simple define our style loss as the sum of the style losses, each corresponding to a single style image. With this approach we do obtain a more accurate total style loss for gradient descent that takes into account multiple style images of a particular artist, but at the same time we increase the running time with each additional style image we include. 2. Eigengram Taking ideas from the eigenface algorithm, we generate an eigengram matrix, the principal eigenvector of all the gram matrices as follows: Let g i be the vectorized gram matrix of style image i. Define G = g 1 g N. 2

Calculate u, the principal eigenvector of GG T. We regard u as the average style image, and perform gradient descent on it only. Figure 4 compares the results of the above methods. The results are similar, but method 2 is much faster to run. (a) Style Image 1 (b) Style Image 2 At iteration k, q = g k ; for i=k-1, k-2,..., k-m do i = i s T i q; q = q i y i ; end (Hk 0) 1 = s T k 1 y k 1/yk T 1 y k 1; z =(Hk 0) 1 q; for i = k-m, k-m+1,..., k-1 do i = + yi T z; z = z + s i ( i i ); end (H k ) 1 g k = z Algorithm 1: L-BFGS update where g k = rf(x),s k = x k+1 x k,y k = g k+1 g k, and k =1/(yk T s k). Once we have the approximation of (H k ) 1 g k, we can update our input x as follows: x k+1 = x k (H k ) 1 g k where is the learning rate A comparison of the convergence rate of ADAM and L-FBGS can be seen in Figure 5. (c) Avg Img (eigengram) (d) Avg Img (mixed g.d.) Figure 4: Comparison of Average Style Figure 5: Comparison of optimization methods Gradient Descent We found standard gradient descent update rules such as momentum, or ADAM to be relatively slow to converge. Newton s method, which uses gradient and inverse Hessian matrix to steer search directions, gives higher convergence speed. However, computing a Hessian can be both time and space consuming. Some functions do not have second derivatives. L-BFGS is a good alternative to gradient descent and Newton s method. Unlike Newton s method, L-BFGS uses an approximation of the inverse of Hessian matrix. To do this, it stores last m updates of x and its gradient r(f(x)), which represent the approximation implicitly. This method can also be used when problems are large and memory resource is limited. The way it approximates the product of inverse Hessian and gradient matrix is described as follows. 6 Results Some pretty visualizations of our results are in the appendix at the end. To quantify the performance of our algorithm, we pass our generated image through the face classification network, and compare the classification score with that of the original content image. The results are below: Image Class. Score Correct Class. Original 0.961 Yes Warhol Style 0.093 Yes Anime Style 0.804 Yes Van Gough Style 0.852 Yes Picasso Style 0.697 Yes All the style images were still classified correctly, even though their score was lower than the original. 3

Quantifying the performance is rather subjective, but we showed that our generated image still resembled the original image well for most styles. 7 Discussion As we have seen, this project has demonstrated that it is possible to convert any portrait to a desired style. We started with the equations which we used to extract the content and style from the portrait and style images, and from there we wrote code to back propagate the error to the original image. Gradient descent allowed us to generate an image that as closely matched both content and style as we wanted. Furthermore, we added the ability to not only base the error on style and content di erences, but also handled overfitting through regularization using per-pixel loss functions. Then given that the whole calculation was taking quite a while for the solution to converge, we implemented L-BFGS since it was able to beat out gradient descent and Newton s method in rate of convergence. The last feature we added was the ability to combine multiple style images together. Previous work only took one style image, and transformed a portrait to that image s style. However in order to more accurately represent an artist s style, we needed to use more than a single sample of the artist s work. This led us to developing mixed gradient descent and eigengram multiple style transfer techniques to accomplish a more holistic artistic style transfer. The collective set of these new features allowed us to explore artistic style transfer more deeply than before. 8 Future Work Potential future projects include di erent style transfers on a single image, such as transferring the face to Picasso, while changing the background to Warhol. One approach to accomplish this would involve separating a portrait image into background and facial regions, and running our gradient descent algorithm on those two regions separately. The combination of the two would result in an image of two separate styles on separate regions. We would also like to explore real time style transfer. Having a real time algorithm would open up a whole new set of possibilities, such as transforming a live action film in real time to anime, or even converting a virtual reality enhanced display of the environment to Picasso s artistic style. References [1] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Jun 2016. [2] Justin Johnson, Alexandre Alahi, and Fei-Fei Li. Perceptual losses for real-time style transfer and superresolution. CoRR, abs/1603.08155, 2016. [3] Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. A learned representation for artistic style. CoRR, abs/1610.07629, 2016. [4] Matthew Turk and Alex Pentland. Eigenfaces for recognition. J. Cognitive Neuroscience, 3(1):71 86, January 1991. [5] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In British Machine Vision Conference, 2015. [6] Aravindh Mahendran and Andrea Vedaldi. Understanding deep image representations by inverting them. CoRR, abs/1412.0035, 2014. 4

Appendix: Visualizations Original Content Image: Chow Yun Fat (a) Original (b) Content Rep. Anime Style: Andy Warhol Style: Picasso Style: Van Gough Style: 5